Lecture 7: Receptive fields

Lecturer: Sophie Denève

Bottleneck at the optic nerve (really narrow)

Center-surround RF: respond to light in the center/darkness in the surround

Surround-Center: the other way round

Hubel and Wiesel: cells sensitive to edges in V1

Suprise of an event $x$:
\[S(x) = - \log p(x)\]
Entropy:
\[H = 𝔼(- \log p(x)) = - \int_A p(x) \log p(x) dx\]
  • Minimum entropy: Dirac
  • Maximum entropy: uniform distribution
Mutual information between $X$ and $Y$:
\[I(X, Y) = H(X) - H(X \mid Y) = H(Y) - H(Y \mid X)\]

For one neuron

Let’s say $X$ is the image in the retina, $Y$ the cortex: you want to maximize mutual information between $X$ and $Y$, i.e.:

  • maximize $H(X)$ ⟶ you want to represent something complex
  • minimize $H(X \mid Y)$ ⟶ you want a reliable representation of $X$

Now: $r$: neuron’s response, $s$: stimulus (ex: image)

Analysis models

Analysis models:
given a fixed $H(r \mid s)$: maximize \(H(r) - H(r \mid s)\)

Find the distribution $p(r)$ that maximizes the entropy:

\[H(r) = - \int_{[r_{min}, r_{max}]} \log p(r) p(r) dr\] \[r(s) = \int_{-∞}^s Z p(s') ds'\]

Now, if:

\[r = σ(ws + w_0)\]

which $(w, w_0)$ maximize the output entropy?

\[H(r) = - 𝔼(\log p(r))\]

but as there’s a one-to-one correspondence between stimulus and response:

\[p(r)dr = p(s)ds\\ ⟹ p(r) = \frac{p(s)}{dr/ds}\] \[H(r) = - 𝔼(\log p(r)) + 𝔼\Big(\log \frac{dr}{ds}\Big)\]

So stoachastic gradient descent yields the following learning rule:

  • \[τ \dot{w} = \frac{\partial H}{\partial w} = \frac{1}{w} + s(1 - 2r)\]
  • \[τ \dot{w_0} = \frac{\partial H}{\partial w_0} = 1 - 2r\]

For several ones

Independent Component Analysis: joint distribution $p(r_1, r_2)$

To maximize the entropy, you not only want $p(r_1)$ and $p(r_2)$ to be uniform, but also $r_1$ and $r_2$ to be uncorrelated.

Ouput entropy:
\[H(\textbf{r}) = - \int \log p(\textbf{r}) p(\textbf{r}) d\textbf{r}\]
Population activation:
\[\textbf{r} = f(\textbf{W} \textbf{s} + \textbf{w}_0)\]

If the weight matrix $\textbf{W}$ is invertible:

  • \[\dot{\textbf{W}} = α((\textbf{W}^T)^{-1} + (1 - 2\textbf{r})\textbf{s}^T)\]
  • \[\dot{\textbf{w}_0} = α (1 - 2r)\]

ICA: extension of PCA where the axis on which we project are not necessarily orthogonal ⟶ we seek the most interesting directions

Generative models

Generative models:
given a fixed $H(s)$: maximize \(H(s) - H(s \mid r)\) i.e. minimize the reconstruction error $H(s \mid r)$
Assumption:
\[s_i = \sum\limits_{ j } Φ_{i,j} r_j\]
\[I(\textbf{s}, \textbf{h}) = H(\textbf{s})-H(\textbf{s} \mid \textbf{h})\]

but we don’t have access to $r_j$: we will infer $h_j$ (hidden) values and learn $Φ$ so that:

\[s_i = \sum\limits_{ j } Φ_j h_j + Noise\]

What we know/assume:

  • the $h_j$ are independent:

    \[P(h) = \prod_j P(h_j)\]
  • Images on the retina:

    \[s_0, ⋯, s_T\]
  • \[\textbf{s} = \textbf{Φ} \textbf{h} + 𝒩\]

Goal: find $Φ$ to minimize $H(\textbf{s} \mid \textbf{h})$

We assume that $p(\textbf{s} \mid \textbf{h})$ is Gaussian

\[p(\textbf{s} \mid \textbf{h}) = \frac{1}{\sqrt{2 π}} \exp(-(\textbf{s} - \textbf{Φ} \textbf{h})^2/2)\] \[H(\textbf{s} \mid \textbf{h}) = - \log(p(\textbf{s} \mid \textbf{h})) = Z - (\textbf{s} - \textbf{Φ} \textbf{h})^2/2\\ = \sum\limits_{ t' } (s_{t'} - [\textbf{Φ} \textbf{h}]_{t'})^2\]

Leave a comment