Lecture 7: Receptive fields
Lecturer: Sophie Denève
Bottleneck at the optic nerve (really narrow)
Center-surround RF: respond to light in the center/darkness in the surround
Surround-Center: the other way round
Hubel and Wiesel: cells sensitive to edges in V1
- Suprise of an event $x$:
- \[S(x) = - \log p(x)\]
- Entropy:
- \[H = 𝔼(- \log p(x)) = - \int_A p(x) \log p(x) dx\]
- Minimum entropy: Dirac
- Maximum entropy: uniform distribution
- Mutual information between $X$ and $Y$:
- \[I(X, Y) = H(X) - H(X \mid Y) = H(Y) - H(Y \mid X)\]
For one neuron
Let’s say $X$ is the image in the retina, $Y$ the cortex: you want to maximize mutual information between $X$ and $Y$, i.e.:
- maximize $H(X)$ ⟶ you want to represent something complex
- minimize $H(X \mid Y)$ ⟶ you want a reliable representation of $X$
Now: $r$: neuron’s response, $s$: stimulus (ex: image)
Analysis models
- Analysis models:
- given a fixed $H(r \mid s)$: maximize \(H(r) - H(r \mid s)\)
Find the distribution $p(r)$ that maximizes the entropy:
\[H(r) = - \int_{[r_{min}, r_{max}]} \log p(r) p(r) dr\] \[r(s) = \int_{-∞}^s Z p(s') ds'\]Now, if:
\[r = σ(ws + w_0)\]which $(w, w_0)$ maximize the output entropy?
\[H(r) = - 𝔼(\log p(r))\]but as there’s a one-to-one correspondence between stimulus and response:
\[p(r)dr = p(s)ds\\ ⟹ p(r) = \frac{p(s)}{dr/ds}\] \[H(r) = - 𝔼(\log p(r)) + 𝔼\Big(\log \frac{dr}{ds}\Big)\]So stoachastic gradient descent yields the following learning rule:
- \[τ \dot{w} = \frac{\partial H}{\partial w} = \frac{1}{w} + s(1 - 2r)\]
- \[τ \dot{w_0} = \frac{\partial H}{\partial w_0} = 1 - 2r\]
For several ones
Independent Component Analysis: joint distribution $p(r_1, r_2)$
To maximize the entropy, you not only want $p(r_1)$ and $p(r_2)$ to be uniform, but also $r_1$ and $r_2$ to be uncorrelated.
- Ouput entropy:
- \[H(\textbf{r}) = - \int \log p(\textbf{r}) p(\textbf{r}) d\textbf{r}\]
- Population activation:
- \[\textbf{r} = f(\textbf{W} \textbf{s} + \textbf{w}_0)\]
If the weight matrix $\textbf{W}$ is invertible:
- \[\dot{\textbf{W}} = α((\textbf{W}^T)^{-1} + (1 - 2\textbf{r})\textbf{s}^T)\]
- \[\dot{\textbf{w}_0} = α (1 - 2r)\]
ICA: extension of PCA where the axis on which we project are not necessarily orthogonal ⟶ we seek the most interesting directions
Generative models
- Generative models:
- given a fixed $H(s)$: maximize \(H(s) - H(s \mid r)\) i.e. minimize the reconstruction error $H(s \mid r)$
- Assumption:
- \[s_i = \sum\limits_{ j } Φ_{i,j} r_j\]
but we don’t have access to $r_j$: we will infer $h_j$ (hidden) values and learn $Φ$ so that:
\[s_i = \sum\limits_{ j } Φ_j h_j + Noise\]What we know/assume:
-
the $h_j$ are independent:
\[P(h) = \prod_j P(h_j)\] -
Images on the retina:
\[s_0, ⋯, s_T\] - \[\textbf{s} = \textbf{Φ} \textbf{h} + 𝒩\]
Goal: find $Φ$ to minimize $H(\textbf{s} \mid \textbf{h})$
We assume that $p(\textbf{s} \mid \textbf{h})$ is Gaussian
\[p(\textbf{s} \mid \textbf{h}) = \frac{1}{\sqrt{2 π}} \exp(-(\textbf{s} - \textbf{Φ} \textbf{h})^2/2)\] \[H(\textbf{s} \mid \textbf{h}) = - \log(p(\textbf{s} \mid \textbf{h})) = Z - (\textbf{s} - \textbf{Φ} \textbf{h})^2/2\\ = \sum\limits_{ t' } (s_{t'} - [\textbf{Φ} \textbf{h}]_{t'})^2\]
Leave a comment