# Lecture 7: Receptive fields

Lecturer: Sophie Denève

Bottleneck at the optic nerve (really narrow)

Center-surround RF: respond to light in the center/darkness in the surround

Surround-Center: the other way round

Hubel and Wiesel: cells sensitive to edges in V1

Suprise of an event $x$:
$S(x) = - \log p(x)$
Entropy:
$H = 𝔼(- \log p(x)) = - \int_A p(x) \log p(x) dx$
• Minimum entropy: Dirac
• Maximum entropy: uniform distribution
Mutual information between $X$ and $Y$:
$I(X, Y) = H(X) - H(X \mid Y) = H(Y) - H(Y \mid X)$

## For one neuron

Let’s say $X$ is the image in the retina, $Y$ the cortex: you want to maximize mutual information between $X$ and $Y$, i.e.:

• maximize $H(X)$ ⟶ you want to represent something complex
• minimize $H(X \mid Y)$ ⟶ you want a reliable representation of $X$

Now: $r$: neuron’s response, $s$: stimulus (ex: image)

# Analysis models

Analysis models:
given a fixed $H(r \mid s)$: maximize $H(r) - H(r \mid s)$

Find the distribution $p(r)$ that maximizes the entropy:

$H(r) = - \int_{[r_{min}, r_{max}]} \log p(r) p(r) dr$ $r(s) = \int_{-∞}^s Z p(s') ds'$

Now, if:

$r = σ(ws + w_0)$

which $(w, w_0)$ maximize the output entropy?

$H(r) = - 𝔼(\log p(r))$

but as there’s a one-to-one correspondence between stimulus and response:

$p(r)dr = p(s)ds\\ ⟹ p(r) = \frac{p(s)}{dr/ds}$ $H(r) = - 𝔼(\log p(r)) + 𝔼\Big(\log \frac{dr}{ds}\Big)$

So stoachastic gradient descent yields the following learning rule:

• $τ \dot{w} = \frac{\partial H}{\partial w} = \frac{1}{w} + s(1 - 2r)$
• $τ \dot{w_0} = \frac{\partial H}{\partial w_0} = 1 - 2r$

## For several ones

Independent Component Analysis: joint distribution $p(r_1, r_2)$

To maximize the entropy, you not only want $p(r_1)$ and $p(r_2)$ to be uniform, but also $r_1$ and $r_2$ to be uncorrelated.

Ouput entropy:
$H(\textbf{r}) = - \int \log p(\textbf{r}) p(\textbf{r}) d\textbf{r}$
Population activation:
$\textbf{r} = f(\textbf{W} \textbf{s} + \textbf{w}_0)$

If the weight matrix $\textbf{W}$ is invertible:

• $\dot{\textbf{W}} = α((\textbf{W}^T)^{-1} + (1 - 2\textbf{r})\textbf{s}^T)$
• $\dot{\textbf{w}_0} = α (1 - 2r)$

ICA: extension of PCA where the axis on which we project are not necessarily orthogonal ⟶ we seek the most interesting directions

# Generative models

Generative models:
given a fixed $H(s)$: maximize $H(s) - H(s \mid r)$ i.e. minimize the reconstruction error $H(s \mid r)$
Assumption:
$s_i = \sum\limits_{ j } Φ_{i,j} r_j$
$I(\textbf{s}, \textbf{h}) = H(\textbf{s})-H(\textbf{s} \mid \textbf{h})$

but we don’t have access to $r_j$: we will infer $h_j$ (hidden) values and learn $Φ$ so that:

$s_i = \sum\limits_{ j } Φ_j h_j + Noise$

What we know/assume:

• the $h_j$ are independent:

$P(h) = \prod_j P(h_j)$
• Images on the retina:

$s_0, ⋯, s_T$
• $\textbf{s} = \textbf{Φ} \textbf{h} + 𝒩$

Goal: find $Φ$ to minimize $H(\textbf{s} \mid \textbf{h})$

We assume that $p(\textbf{s} \mid \textbf{h})$ is Gaussian

$p(\textbf{s} \mid \textbf{h}) = \frac{1}{\sqrt{2 π}} \exp(-(\textbf{s} - \textbf{Φ} \textbf{h})^2/2)$ $H(\textbf{s} \mid \textbf{h}) = - \log(p(\textbf{s} \mid \textbf{h})) = Z - (\textbf{s} - \textbf{Φ} \textbf{h})^2/2\\ = \sum\limits_{ t' } (s_{t'} - [\textbf{Φ} \textbf{h}]_{t'})^2$

Tags:

Updated: