Lecture 6: Graphical Models and Inference

Lecturer: Pantelis Leptourgos

Dominant theories for the brain:

  • ⟶ bayesian brain hypothesis based on these ideas - same for hidden Markov models
  • sampling hypothesis for brain (the brain uses samples to approximate to posterior)
  • predictive coding theory: the brain update based on the discrepancy between the evidence and the prediction error

Generative models - Inference

You can do anything you want by just using the graphical model. But let us recall the inference generative model.

Bayes theorem:

posterior $\propto$ likelihood × prior

We only have some low-level evidence about the objects in the world (sensory data): based on that, the brain tries to predict what could have caused this input, by creating and internal model ⇒ generative model (learnt by the brain). Then the brain does inference, i.e. inverse this model

Today, we’ll focus on the graphical representation of this graphical model:

  digraph {
    X -> S[label="  P(S | X)"];

You can represent a whole bunch of problems with graphs, as above.

Graphical model: it’s a detailed representation of the joint probability:

\[P(X, S) = P(S \mid X) P(X)\]

Graphical models:

  1. Bayesian networks
  2. Markov Random Fields
  3. Factor graphs

Probabilistic Graphical Models

Graphical model:

it’s a graph, whose nodes (= variables) and edges represent statistical dependencies.

NB: you can represent any distribution as a graphical model.

Conjugate prior:

when multiplied by the likelihood, the posterior is of the same “kind” than the prior (ex: Guaussian distributions).

NB: we’ll most often use Gaussian and Discrete random variables.

Why are they useful?

  • for better visualization
  • properties of joint distribution/computations made easier

    • when it comes to computation: used wisely: graphical models can make you go from exponential computations to linear ones
  • biologically plausible solutions

Graphical models: Bayesian networks (BN)

Directed Acyclic Graphs (DAG) representing causality where

\[x_1 ⟶ x_2\]

means that $x_1$ causes $x_2$

Warning!: you mustn’t have loops! (otherwise: circular argument)

Ex: used for generative models

Constructing a Bayesian Network:

\[P(a, b, c) = P(c \mid a, b) P(b \mid a) P(a)\]
  digraph {
    a -> b, c;
    b -> c;


  • it’s indeed acyclic
  • we could have used a different factorization
  • intersting properties: when we start removing links


Given a BN:

\[p(\textbf{x}) = \prod\limits_{ k=1 }^K p(x_k \mid \underbrace{pa_k}_{\text{parents}})\]

The problem with fully connected graphs is that they have no intersting property. If you remove some links:

  • you restrict the class of distributions
  • you reduce the number of parameters


  digraph {
    x_1 -> x_2;

Fractorization: \(\underbrace{P(x_1 \mid x_2)}_{K_1 (K_2 - 1)}\underbrace{P(x_2)}_{K_2} = P(x_1, x_2) ⟶ K_1 (K_2 - 1)+K_2 \text{ parameters}\)

  digraph {
    x_1; x_2;

Fractorization: \(\underbrace{P(x_1)}_{K_1}\underbrace{P(x_2)}_{K_2} ⟶ K_1+K_2 \text{ parameters}\)


  • Fully connected graph with $M$ variables: $K^M - 1$ parameters

  • Chain $x_1 ⟶ ⋯ ⟶ x_M$: $O(K)$ parameters

Conditional independence

Removing links introduces conditional independences:


\[P(a, b \mid c) = P(a \mid c) P(b \mid c) ⟶ \text{ denoted by } a ⊥ b \mid c\]
  digraph {
    c -> a, b;
\[P(a, b, c) = P(a \mid c) P(b \mid c) P(c)\]

Are $a$ and $b$ independent? Not in general.

But for a given $c$, they are conditionall independent: $P(a, b\mid c) P(c) = P(a, b, c) = P(a \mid c) P(b \mid c) P(c)$


  digraph {
    a -> c -> b;
\[P(a, b, c) = P(a) P(c \mid a) P(b \mid c)\]

Are $a$ and $b$ independent? No:

\[P(a, b) = \sum\limits_{ c } P(a, b, c) = P(a) \sum\limits_{ c } P(c \mid a) P(b \mid c) = P(a) P(b \mid a)\]

Is there independence for a fixed $c$? Yes:

\[P(a, b \mid c) = \frac{P(a, b, c)}{P(c)} = \frac{P(a) P(c\mid a) P(b \mid c)}{P(c)} = P(a \mid c) P(b \mid c)\]

Ex: $a$= tree, $c$=leaf, $c$=green


  digraph {
    a -> c;
    b -> c;
  • $a$ and $b$ are independent

  • For a fixed $c$: $a$ and $b$ become dependent with repsect to $c$

D-separation theorem

Notion of Markov Blanket

Graphical models: Markov Random Fields

Undirected Graphs where you represent soft-constraints:

\[x_1 - x_2\]

knowing $x_1$ incur a constraint on $x_2$

We have theorems analogous to BN.

\[p(\textbf{x}) = \frac 1 Z \prod\limits_{ \text{maximal clique } C} \underbrace{ ψ_C(x_C)}_{\exp(-E(x_C))}\]

Ex: in computer vision: image denoising

Your MRF is a graph where each node is a pixel of the original image, on top of which you have the noisy image

\[P(\textbf{x}) = P(x_1) P(x_2 \mid x_1) ⋯ P(x_N \mid x_{N-1})\]


\[P(\textbf{x}) = \frac 1 Z ψ_{1, 2}(x_1, x_2) ⋯ ψ_{N-1, N}(x_{N-1}, x_N)\]

Inference: message passing algorithms

Inference on a chain

Inference = marginalization (since the posterior is a marginal given an observation)

\[P(\textbf{x}) = \frac 1 Z ψ_{1, 2}(x_1, x_2) ⋯ ψ_{N-1, N}(x_{N-1}, x_N)\]

\[P(\textbf{x}) =\sum\limits_{ x_1, ⋯, x_N } p(\textbf{x})\]

⟹ computational nightmare

Exercise (cf. Exercise Sheet)

  digraph {
    m, a -> r;
    r -> i;

1. Factorize the BN

\[P(m, a, r, i) = P(m) P(a) P(r \mid m, a) P(i \mid r)\]

2. If none of the variables is observed, show that a mosquito bite is independent of an alien abduction. What happens if we observe an itching sensation?

  • No variable observed: head-to-head link ⟹ the path $m ⟶ r ⟶ a$ is blocked ⇒ independence

  • Itching sensation observed: $r$ and $a$ are not independent wrt to $r$ anymore

3. Consider a particular instance of such a graph. A mosquito bite and an alien abduction might have happened or not (\lbrace 1,0 \rbrace), independently of each other, and with prior probabilities:

\[p(MB = 1) = 0.7\\ p(AA = 1) = 0.1\]

Given the state of the MB and AA, a red spot appears with probabilities given by

\[p(RS = 1|MB = 1, AA = 1) = 0.8\\ p(RS = 1|MB = 1, AA = 0) = 0.7\\ p(RS = 1|MB = 0, AA = 1) = 0.4\\ p(RS = 1|MB = 0, AA = 0) = 0.1\]

a. What is the probability that an alien abduction really happened, if we observe a red spot?

\[P(AA = 1 \mid RS = 1) = \frac{P(RS = 1 \mid AA = 1) P(AA = 1)}{\sum\limits_{ 0 ≤ i, j ≤ 1} P(RS = 1|MB = i, AA = j) \underbrace{P(MB = i, AA = j)}_{= P(MB = i) P (AA = j)}} > P(AA=1)\]

it’s larger because now we have some evidence

\[P(AA = 1 \mid RS = 1) > P(AA = 1 \mid RS = 1, MB=1) > P(AA=1)\]

Factor graph:

  graph {
    a -- f_a;
    m -- f_m;
    m, a -- f_am;
    f_am -- r;
    r -- f_r;
    f_r -- i;

Leave a comment