# Lecture 2: Probabilistic approaches to neural computation: the Bayesian Brain

Lecturer: Lyudmila Kushnir

# Uncertainty matters: the Bayesian Brain

All of our decisions are subject to uncertainty

## The Bayesian Brain

• $x$: Prior: $P(x)$

• prior expectation (ex: it’s unlikely to see an elephant in the street)

Objects ⟶ Receptors ⟶ Response

• $s$: spike counts

• Central Nervous System:

• has an internal model $x ⟶ s$
• Posterior probability: $P(x \mid s)$

  digraph {
rankdir=TB;
X1[label="X"]
X1 -> S[label="   Likelihood:  P(S | X)"];
"Prior: P(X)"[shape=none];
"S" -> "Posterior: P(X | S)"[label="    CNS: has a model x ⟶ s "];
}


## Poisson Variability in Cortex

Experiments: Variance spike count appears linear with respect to the mean spike count

p(\text{spike in } δt) = r δt
• $r$: firing rate

The mean spike count (related to $r$) is a function of the stimulus

p(s) = \frac{(rΔt)^s \exp(-rΔt)}{s!}

Assumption: spikes not correlated to each other

NB:

• Mean = Variance $= r Δt$
• As spikes are generated randomly, firing rate carries the information

## Tuning curves

NB: $r$ is denoted by $f$

Asumption: the mean firing rate is a function of the simulus of objets in the world

Tuning curve of some particular neuron:

when the mean firing rate depends on the value of the stimulus

How to guess the direction of the stimulus?

⟨s_i⟩ = f(x-x_i) ≝ f_i(x)

Average pattern of activity:

• x-axis: neurons, according to their preferred direction

• y-axis: activity

$s$ (integer): activity pattern that we measure

p(s_i \mid x) = \frac{f_i(x)^{s_i} \exp(-f_i(x))}{s_i !}
Independent neurons:
p(s \mid x) = \prod\limits_{ i } p(s_i \mid x)

• $x$: events

⇓ Likelihood: $p(s \mid x)$

• $s$: Sensory input neural activity in the sensory areas

⇓ $CNS$

• $\hat{x} = f(s)$
Likelihood:
p(s\mid \bullet)

NB: doesn’t sum to $1$ necessarily

\log(p(s\mid x)) = \sum\limits_{ i } \log(f_i(x)) s_i - \sum\limits_{ i } f_i(x)

How to compute the maximum of likelihood?

⟶ taking the derivative and trying to solve for zero ⟹ too hard to do in practice

p(x \mid s) = \frac{p(s \mid x) p(x)}{p(s)}

Then:

\log(p(x \mid s)) = L_0(x) + \sum\limits_{ i } \log(f_i(x)) s_i - \sum\limits_{ i } f_i(x)

⟹ Log of posterior probability (for $x =x_j$):

L_j = \sum\limits_{ i } \underbrace{ w_{i, j}}_{\text{synaptic weight}} s_i - \underbrace{θ_j}_{\text{bias}}

Ex: trying to jump over a hole

• $x_j$: the hole width is $m$ meters

• $L_j = \log(p(x_j \mid s)) = \sum\limits_{ i } w_{i,j} s_i - θ_j$

NB: the $w_{ij}$ weights are part of the internal model of the brain, they correspond to the synaptic strength

## Cue combination

Cue combination is equivalent to summing activities

Idepentendent stimuli ⟹ Product of proba ⟹ Sum of log posteriors

## Multi-dimensional stimulus: population code

x^1, x^2 ⟶ s
L_{j, k} = \log(p(x_j^1, x_k^2 \mid s)) = \sum\limits_{ i } \underbrace{\log(f_i(x_j^1, x_k^2))}_{W_{j,k}} s_i - \Big(\underbrace{L_0(x_j^1, x_k^2) - \sum\limits_{ i } f_i(x_j^1, x_k^2)}_{θ_{j,k}} \Big)

## Alternative neural code for uncertainty: sampling code

x^1, x^2 ⟶ s ⟶ x_s^1, x_s^2

With the posterior $p(x_1, x_2 \mid s)$: we get $x_s^1, x_s^2$, whose variability represent what happens with $x_1$ and $x_2$ (which are uncertain) in the external world.

Ex: you can infer that whenever $x_1$ is active, $x_2$ is too.

NB: variability is no longer a Poisson distribution ⟶ it mimics the posteriors, which is not necessarily Poisson

Ex: if the posterior is very narrow, the uncertainty is low, and the variability of the activity is also very narrow.

Experimental evidence backing this up: The prior tends to the average posterior of natural stimuli

• Population code: increasses the response gain

• Sample code: decreases the response variance

## Sampling: not clear how to implement, easy computations

  digraph {
rankdir=TB;
"x^1", "x^2" -> s -> "x^1_s", "x^2_s";
"x^1_s" -> y;
"x^2_s" -> y;
}

Chain rule:
p(y \mid s) = \sum\limits_{ x } p(y \mid x) p(x \mid s)

Ex: back to jumping over a hole

• $x_1$: distance between the edges

• $x^2$: alligators are there

• $y$ is active as much as there is danger:

y = (x^1 > \underbrace{d}_{\text{my threshold, how far I can jump}}) \text{ and } x^2
  digraph {
rankdir=TB;
"x^3" -> "y^1";
}

• $x_3$: collection of sensory evidence that a tiger is there or not

• $y_1$: the tiger is there

NB: the activity of the $x^i_s$ are sampled from the posterior distribution, whereas before (in the population code), they were an average: the log posterior.

Tags:

Updated: