# Lecture 6: Graphical Models and Inference

Lecturer: Pantelis Leptourgos

Dominant theories for the brain:

• ⟶ bayesian brain hypothesis based on these ideas - same for hidden Markov models
• sampling hypothesis for brain (the brain uses samples to approximate to posterior)
• predictive coding theory: the brain update based on the discrepancy between the evidence and the prediction error

# Generative models - Inference

You can do anything you want by just using the graphical model. But let us recall the inference generative model.

Bayes theorem:

posterior $\propto$ likelihood × prior

We only have some low-level evidence about the objects in the world (sensory data): based on that, the brain tries to predict what could have caused this input, by creating and internal model ⇒ generative model (learnt by the brain). Then the brain does inference, i.e. inverse this model

Today, we’ll focus on the graphical representation of this graphical model:

  digraph {
rankdir=TB;
X -> S[label="  P(S | X)"];
}


You can represent a whole bunch of problems with graphs, as above.

Graphical model: it’s a detailed representation of the joint probability:

$P(X, S) = P(S \mid X) P(X)$

Graphical models:

1. Bayesian networks
2. Markov Random Fields
3. Factor graphs

## Probabilistic Graphical Models

Graphical model:

it’s a graph, whose nodes (= variables) and edges represent statistical dependencies.

NB: you can represent any distribution as a graphical model.

Conjugate prior:

when multiplied by the likelihood, the posterior is of the same “kind” than the prior (ex: Guaussian distributions).

NB: we’ll most often use Gaussian and Discrete random variables.

Why are they useful?

• for better visualization
• properties of joint distribution/computations made easier

• when it comes to computation: used wisely: graphical models can make you go from exponential computations to linear ones
• biologically plausible solutions

## Graphical models: Bayesian networks (BN)

Directed Acyclic Graphs (DAG) representing causality where

$x_1 ⟶ x_2$

means that $x_1$ causes $x_2$

Warning!: you mustn’t have loops! (otherwise: circular argument)

Ex: used for generative models

Constructing a Bayesian Network:

$P(a, b, c) = P(c \mid a, b) P(b \mid a) P(a)$
  digraph {
rankdir=LR;
a -> b, c;
b -> c;
}


NB:

• it’s indeed acyclic
• we could have used a different factorization
• intersting properties: when we start removing links

### Factorization

Given a BN:

$p(\textbf{x}) = \prod\limits_{ k=1 }^K p(x_k \mid \underbrace{pa_k}_{\text{parents}})$

The problem with fully connected graphs is that they have no intersting property. If you remove some links:

• you restrict the class of distributions
• you reduce the number of parameters

Ex:

  digraph {
rankdir=LR;
x_1 -> x_2;
}


Fractorization: $\underbrace{P(x_1 \mid x_2)}_{K_1 (K_2 - 1)}\underbrace{P(x_2)}_{K_2} = P(x_1, x_2) ⟶ K_1 (K_2 - 1)+K_2 \text{ parameters}$

  digraph {
rankdir=LR;
x_1; x_2;
}


Fractorization: $\underbrace{P(x_1)}_{K_1}\underbrace{P(x_2)}_{K_2} ⟶ K_1+K_2 \text{ parameters}$

Likewise:

• Fully connected graph with $M$ variables: $K^M - 1$ parameters

• Chain $x_1 ⟶ ⋯ ⟶ x_M$: $O(K)$ parameters

### Conditional independence

EX1:

$P(a, b \mid c) = P(a \mid c) P(b \mid c) ⟶ \text{ denoted by } a ⊥ b \mid c$
  digraph {
rankdir=TB;
c -> a, b;
}

$P(a, b, c) = P(a \mid c) P(b \mid c) P(c)$

Are $a$ and $b$ independent? Not in general.

But for a given $c$, they are conditionall independent: $P(a, b\mid c) P(c) = P(a, b, c) = P(a \mid c) P(b \mid c) P(c)$

EX2:

  digraph {
rankdir=LR;
a -> c -> b;
}

$P(a, b, c) = P(a) P(c \mid a) P(b \mid c)$

Are $a$ and $b$ independent? No:

$P(a, b) = \sum\limits_{ c } P(a, b, c) = P(a) \sum\limits_{ c } P(c \mid a) P(b \mid c) = P(a) P(b \mid a)$

Is there independence for a fixed $c$? Yes:

$P(a, b \mid c) = \frac{P(a, b, c)}{P(c)} = \frac{P(a) P(c\mid a) P(b \mid c)}{P(c)} = P(a \mid c) P(b \mid c)$

Ex: $a$= tree, $c$=leaf, $c$=green

EX3:

  digraph {
rankdir=LR;
a -> c;
b -> c;
}

• $a$ and $b$ are independent

• For a fixed $c$: $a$ and $b$ become dependent with repsect to $c$

D-separation theorem

Notion of Markov Blanket

### Graphical models: Markov Random Fields

Undirected Graphs where you represent soft-constraints:

$x_1 - x_2$

knowing $x_1$ incur a constraint on $x_2$

We have theorems analogous to BN.

$p(\textbf{x}) = \frac 1 Z \prod\limits_{ \text{maximal clique } C} \underbrace{ ψ_C(x_C)}_{\exp(-E(x_C))}$

Ex: in computer vision: image denoising

Your MRF is a graph where each node is a pixel of the original image, on top of which you have the noisy image

$P(\textbf{x}) = P(x_1) P(x_2 \mid x_1) ⋯ P(x_N \mid x_{N-1})$

becomes

$P(\textbf{x}) = \frac 1 Z ψ_{1, 2}(x_1, x_2) ⋯ ψ_{N-1, N}(x_{N-1}, x_N)$

# Inference: message passing algorithms

## Inference on a chain

Inference = marginalization (since the posterior is a marginal given an observation)

$P(\textbf{x}) = \frac 1 Z ψ_{1, 2}(x_1, x_2) ⋯ ψ_{N-1, N}(x_{N-1}, x_N)$

$P(\textbf{x}) =\sum\limits_{ x_1, ⋯, x_N } p(\textbf{x})$

⟹ computational nightmare

# Exercise (cf. Exercise Sheet)

  digraph {
rankdir=TB;
m, a -> r;
r -> i;
}


## 1. Factorize the BN

$P(m, a, r, i) = P(m) P(a) P(r \mid m, a) P(i \mid r)$

## 2. If none of the variables is observed, show that a mosquito bite is independent of an alien abduction. What happens if we observe an itching sensation?

• No variable observed: head-to-head link ⟹ the path $m ⟶ r ⟶ a$ is blocked ⇒ independence

• Itching sensation observed: $r$ and $a$ are not independent wrt to $r$ anymore

## 3. Consider a particular instance of such a graph. A mosquito bite and an alien abduction might have happened or not (\lbrace 1,0 \rbrace), independently of each other, and with prior probabilities:

$p(MB = 1) = 0.7\\ p(AA = 1) = 0.1$

Given the state of the MB and AA, a red spot appears with probabilities given by

$p(RS = 1|MB = 1, AA = 1) = 0.8\\ p(RS = 1|MB = 1, AA = 0) = 0.7\\ p(RS = 1|MB = 0, AA = 1) = 0.4\\ p(RS = 1|MB = 0, AA = 0) = 0.1$

### a. What is the probability that an alien abduction really happened, if we observe a red spot?

$P(AA = 1 \mid RS = 1) = \frac{P(RS = 1 \mid AA = 1) P(AA = 1)}{\sum\limits_{ 0 ≤ i, j ≤ 1} P(RS = 1|MB = i, AA = j) \underbrace{P(MB = i, AA = j)}_{= P(MB = i) P (AA = j)}} > P(AA=1)$

it’s larger because now we have some evidence

$P(AA = 1 \mid RS = 1) > P(AA = 1 \mid RS = 1, MB=1) > P(AA=1)$

Factor graph:

  graph {
f_a[shape=box];
f_m[shape=box];
f_am[shape=box];
f_r[shape=box];
a -- f_a;
m -- f_m;
m, a -- f_am;
f_am -- r;
r -- f_r;
f_r -- i;
}


Tags:

Updated: