Lecture 2: Probabilistic approaches to neural computation: the Bayesian Brain

Lecturer: Lyudmila Kushnir

Uncertainty matters: the Bayesian Brain

All of our decisions are subject to uncertainty

The Bayesian Brain

  • $x$: Prior: $P(x)$

    • prior expectation (ex: it’s unlikely to see an elephant in the street)

Objects ⟶ Receptors ⟶ Response

  • $s$: spike counts

  • Central Nervous System:

    • has an internal model $x ⟶ s$
  • Posterior probability: $P(x \mid s)$

  digraph {
    X1 -> S[label="   Likelihood:  P(S | X)"];
    "Prior: P(X)"[shape=none];
    "S" -> "Posterior: P(X | S)"[label="    CNS: has a model x ⟶ s "];

Poisson Variability in Cortex

Experiments: Variance spike count appears linear with respect to the mean spike count

\[p(\text{spike in } δt) = r δt\]
  • $r$: firing rate

The mean spike count (related to $r$) is a function of the stimulus

\[p(s) = \frac{(rΔt)^s \exp(-rΔt)}{s!}\]

Assumption: spikes not correlated to each other


  • Mean = Variance $= r Δt$
  • As spikes are generated randomly, firing rate carries the information

Tuning curves

NB: $r$ is denoted by $f$

Asumption: the mean firing rate is a function of the simulus of objets in the world

Tuning curve of some particular neuron:

when the mean firing rate depends on the value of the stimulus

How to guess the direction of the stimulus?

\[⟨s_i⟩ = f(x-x_i) ≝ f_i(x)\]

Average pattern of activity:

  • x-axis: neurons, according to their preferred direction

  • y-axis: activity

$s$ (integer): activity pattern that we measure

\[p(s_i \mid x) = \frac{f_i(x)^{s_i} \exp(-f_i(x))}{s_i !}\]
Independent neurons:
\[p(s \mid x) = \prod\limits_{ i } p(s_i \mid x)\]

  • $x$: events

⇓ Likelihood: $p(s \mid x)$

  • $s$: Sensory input neural activity in the sensory areas

⇓ $CNS$

  • $\hat{x} = f(s)$
\[p(s\mid \bullet)\]

NB: doesn’t sum to $1$ necessarily

\[\log(p(s\mid x)) = \sum\limits_{ i } \log(f_i(x)) s_i - \sum\limits_{ i } f_i(x)\]

How to compute the maximum of likelihood?

⟶ taking the derivative and trying to solve for zero ⟹ too hard to do in practice

Instead, consider:

\[p(x \mid s) = \frac{p(s \mid x) p(x)}{p(s)}\]


\[\log(p(x \mid s)) = L_0(x) + \sum\limits_{ i } \log(f_i(x)) s_i - \sum\limits_{ i } f_i(x)\]

⟹ Log of posterior probability (for $x =x_j$):

\[L_j = \sum\limits_{ i } \underbrace{ w_{i, j}}_{\text{synaptic weight}} s_i - \underbrace{θ_j}_{\text{bias}}\]

Ex: trying to jump over a hole

  • $x_j$: the hole width is $m$ meters

  • $L_j = \log(p(x_j \mid s)) = \sum\limits_{ i } w_{i,j} s_i - θ_j$

NB: the $w_{ij}$ weights are part of the internal model of the brain, they correspond to the synaptic strength

Cue combination

Cue combination is equivalent to summing activities

Idepentendent stimuli ⟹ Product of proba ⟹ Sum of log posteriors

Multi-dimensional stimulus: population code

\[x^1, x^2 ⟶ s\] \[L_{j, k} = \log(p(x_j^1, x_k^2 \mid s)) = \sum\limits_{ i } \underbrace{\log(f_i(x_j^1, x_k^2))}_{W_{j,k}} s_i - \Big(\underbrace{L_0(x_j^1, x_k^2) - \sum\limits_{ i } f_i(x_j^1, x_k^2)}_{θ_{j,k}} \Big)\]

Alternative neural code for uncertainty: sampling code

\[x^1, x^2 ⟶ s ⟶ x_s^1, x_s^2\]

With the posterior $p(x_1, x_2 \mid s)$: we get $x_s^1, x_s^2$, whose variability represent what happens with $x_1$ and $x_2$ (which are uncertain) in the external world.

Ex: you can infer that whenever $x_1$ is active, $x_2$ is too.

NB: variability is no longer a Poisson distribution ⟶ it mimics the posteriors, which is not necessarily Poisson

Ex: if the posterior is very narrow, the uncertainty is low, and the variability of the activity is also very narrow.

Experimental evidence backing this up: The prior tends to the average posterior of natural stimuli

  • Population code: increasses the response gain

  • Sample code: decreases the response variance

Sampling: not clear how to implement, easy computations

  digraph {
    "x^1", "x^2" -> s -> "x^1_s", "x^2_s";
    "x^1_s" -> y;
    "x^2_s" -> y;
Chain rule:
\[p(y \mid s) = \sum\limits_{ x } p(y \mid x) p(x \mid s)\]

Ex: back to jumping over a hole

  • $x_1$: distance between the edges

  • $x^2$: alligators are there

  • $y$ is active as much as there is danger:

    \[y = (x^1 > \underbrace{d}_{\text{my threshold, how far I can jump}}) \text{ and } x^2\]
  digraph {
    "x^3" -> "y^1";
  • $x_3$: collection of sensory evidence that a tiger is there or not

  • $y_1$: the tiger is there

NB: the activity of the $x^i_s$ are sampled from the posterior distribution, whereas before (in the population code), they were an average: the log posterior.

Leave a comment