Lecture 7: Receptive fields

Lecturer: Sophie Denève

Bottleneck at the optic nerve (really narrow)

Center-surround RF: respond to light in the center/darkness in the surround

Surround-Center: the other way round

Hubel and Wiesel: cells sensitive to edges in V1

Suprise of an event $x$:
S(x) = - \log p(x)
H = 𝔼(- \log p(x)) = - \int_A p(x) \log p(x) dx
  • Minimum entropy: Dirac
  • Maximum entropy: uniform distribution
Mutual information between $X$ and $Y$:
I(X, Y) = H(X) - H(X \mid Y) = H(Y) - H(Y \mid X)

For one neuron

Let’s say $X$ is the image in the retina, $Y$ the cortex: you want to maximize mutual information between $X$ and $Y$, i.e.:

  • maximize $H(X)$ ⟶ you want to represent something complex
  • minimize $H(X \mid Y)$ ⟶ you want a reliable representation of $X$

Now: $r$: neuron’s response, $s$: stimulus (ex: image)

Analysis models

Analysis models:
given a fixed $H(r \mid s)$: maximize H(r) - H(r \mid s)

Find the distribution $p(r)$ that maximizes the entropy:

H(r) = - \int_{[r_{min}, r_{max}]} \log p(r) p(r) dr
r(s) = \int_{-∞}^s Z p(s') ds'

Now, if:

r = σ(ws + w_0)

which $(w, w_0)$ maximize the output entropy?

H(r) = - 𝔼(\log p(r))

but as there’s a one-to-one correspondence between stimulus and response:

p(r)dr = p(s)ds\\ ⟹ p(r) = \frac{p(s)}{dr/ds}
H(r) = - 𝔼(\log p(r)) + 𝔼\Big(\log \frac{dr}{ds}\Big)

So stoachastic gradient descent yields the following learning rule:

  • τ \dot{w} = \frac{\partial H}{\partial w} = \frac{1}{w} + s(1 - 2r)
  • τ \dot{w_0} = \frac{\partial H}{\partial w_0} = 1 - 2r

For several ones

Independent Component Analysis: joint distribution $p(r_1, r_2)$

To maximize the entropy, you not only want $p(r_1)$ and $p(r_2)$ to be uniform, but also $r_1$ and $r_2$ to be uncorrelated.

Ouput entropy:
H(\textbf{r}) = - \int \log p(\textbf{r}) p(\textbf{r}) d\textbf{r}
Population activation:
\textbf{r} = f(\textbf{W} \textbf{s} + \textbf{w}_0)

If the weight matrix $\textbf{W}$ is invertible:

  • \dot{\textbf{W}} = α((\textbf{W}^T)^{-1} + (1 - 2\textbf{r})\textbf{s}^T)
  • \dot{\textbf{w}_0} = α (1 - 2r)

ICA: extension of PCA where the axis on which we project are not necessarily orthogonal ⟶ we seek the most interesting directions

Generative models

Generative models:
given a fixed $H(s)$: maximize H(s) - H(s \mid r) i.e. minimize the reconstruction error $H(s \mid r)$
s_i = \sum\limits_{ j } Φ_{i,j} r_j
I(\textbf{s}, \textbf{h}) = H(\textbf{s})-H(\textbf{s} \mid \textbf{h})

but we don’t have access to $r_j$: we will infer $h_j$ (hidden) values and learn $Φ$ so that:

s_i = \sum\limits_{ j } Φ_j h_j + Noise

What we know/assume:

  • the $h_j$ are independent:

    P(h) = \prod_j P(h_j)
  • Images on the retina:

    s_0, ⋯, s_T
  • \textbf{s} = \textbf{Φ} \textbf{h} + 𝒩

Goal: find $Φ$ to minimize $H(\textbf{s} \mid \textbf{h})$

We assume that $p(\textbf{s} \mid \textbf{h})$ is Gaussian

p(\textbf{s} \mid \textbf{h}) = \frac{1}{\sqrt{2 π}} \exp(-(\textbf{s} - \textbf{Φ} \textbf{h})^2/2)
H(\textbf{s} \mid \textbf{h}) = - \log(p(\textbf{s} \mid \textbf{h})) = Z - (\textbf{s} - \textbf{Φ} \textbf{h})^2/2\\ = \sum\limits_{ t' } (s_{t'} - [\textbf{Φ} \textbf{h}]_{t'})^2

Leave a Comment