Lecture 6: Graphical Models and Inference
Lecturer: Pantelis Leptourgos
Dominant theories for the brain:
 ⟶ bayesian brain hypothesis based on these ideas  same for hidden Markov models
 sampling hypothesis for brain (the brain uses samples to approximate to posterior)
 predictive coding theory: the brain update based on the discrepancy between the evidence and the prediction error
Generative models  Inference
You can do anything you want by just using the graphical model. But let us recall the inference generative model.
 Bayes theorem:

posterior $\propto$ likelihood × prior
We only have some lowlevel evidence about the objects in the world (sensory data): based on that, the brain tries to predict what could have caused this input, by creating and internal model ⇒ generative model (learnt by the brain). Then the brain does inference, i.e. inverse this model
Today, we’ll focus on the graphical representation of this graphical model:
digraph {
rankdir=TB;
X > S[label=" P(S  X)"];
}
You can represent a whole bunch of problems with graphs, as above.
Graphical model: it’s a detailed representation of the joint probability:
\[P(X, S) = P(S \mid X) P(X)\]Graphical models:
 Bayesian networks
 Markov Random Fields
 Factor graphs
Probabilistic Graphical Models
 Graphical model:

it’s a graph, whose nodes (= variables) and edges represent statistical dependencies.
NB: you can represent any distribution as a graphical model.
 Conjugate prior:

when multiplied by the likelihood, the posterior is of the same “kind” than the prior (ex: Guaussian distributions).
NB: we’ll most often use Gaussian and Discrete random variables.
Why are they useful?
 for better visualization

properties of joint distribution/computations made easier
 when it comes to computation: used wisely: graphical models can make you go from exponential computations to linear ones
 biologically plausible solutions
Graphical models: Bayesian networks (BN)
Directed Acyclic Graphs (DAG) representing causality where
\[x_1 ⟶ x_2\]means that $x_1$ causes $x_2$
Warning!: you mustn’t have loops! (otherwise: circular argument)
Ex: used for generative models
Constructing a Bayesian Network:
\[P(a, b, c) = P(c \mid a, b) P(b \mid a) P(a)\] digraph {
rankdir=LR;
a > b, c;
b > c;
}
NB:
 it’s indeed acyclic
 we could have used a different factorization
 intersting properties: when we start removing links
Factorization
Given a BN:
\[p(\textbf{x}) = \prod\limits_{ k=1 }^K p(x_k \mid \underbrace{pa_k}_{\text{parents}})\]The problem with fully connected graphs is that they have no intersting property. If you remove some links:
 you restrict the class of distributions
 you reduce the number of parameters
Ex:
digraph {
rankdir=LR;
x_1 > x_2;
}
Fractorization: \(\underbrace{P(x_1 \mid x_2)}_{K_1 (K_2  1)}\underbrace{P(x_2)}_{K_2} = P(x_1, x_2) ⟶ K_1 (K_2  1)+K_2 \text{ parameters}\)
digraph {
rankdir=LR;
x_1; x_2;
}
Fractorization: \(\underbrace{P(x_1)}_{K_1}\underbrace{P(x_2)}_{K_2} ⟶ K_1+K_2 \text{ parameters}\)
Likewise:

Fully connected graph with $M$ variables: $K^M  1$ parameters

Chain $x_1 ⟶ ⋯ ⟶ x_M$: $O(K)$ parameters
Conditional independence
Removing links introduces conditional independences:
EX1:
\[P(a, b \mid c) = P(a \mid c) P(b \mid c) ⟶ \text{ denoted by } a ⊥ b \mid c\] digraph {
rankdir=TB;
c > a, b;
}
\[P(a, b, c) = P(a \mid c) P(b \mid c) P(c)\]
Are $a$ and $b$ independent? Not in general.
But for a given $c$, they are conditionall independent: $P(a, b\mid c) P(c) = P(a, b, c) = P(a \mid c) P(b \mid c) P(c)$
EX2:
digraph {
rankdir=LR;
a > c > b;
}
\[P(a, b, c) = P(a) P(c \mid a) P(b \mid c)\]
Are $a$ and $b$ independent? No:
\[P(a, b) = \sum\limits_{ c } P(a, b, c) = P(a) \sum\limits_{ c } P(c \mid a) P(b \mid c) = P(a) P(b \mid a)\]Is there independence for a fixed $c$? Yes:
\[P(a, b \mid c) = \frac{P(a, b, c)}{P(c)} = \frac{P(a) P(c\mid a) P(b \mid c)}{P(c)} = P(a \mid c) P(b \mid c)\]Ex: $a$= tree, $c$=leaf, $c$=green
EX3:
digraph {
rankdir=LR;
a > c;
b > c;
}

$a$ and $b$ are independent

For a fixed $c$: $a$ and $b$ become dependent with repsect to $c$
⟹ Dseparation theorem
Notion of Markov Blanket
Graphical models: Markov Random Fields
Undirected Graphs where you represent softconstraints:
\[x_1  x_2\]knowing $x_1$ incur a constraint on $x_2$
We have theorems analogous to BN.
\[p(\textbf{x}) = \frac 1 Z \prod\limits_{ \text{maximal clique } C} \underbrace{ ψ_C(x_C)}_{\exp(E(x_C))}\]Ex: in computer vision: image denoising
Your MRF is a graph where each node is a pixel of the original image, on top of which you have the noisy image
Link with BN
\[P(\textbf{x}) = P(x_1) P(x_2 \mid x_1) ⋯ P(x_N \mid x_{N1})\]becomes
\[P(\textbf{x}) = \frac 1 Z ψ_{1, 2}(x_1, x_2) ⋯ ψ_{N1, N}(x_{N1}, x_N)\]Inference: message passing algorithms
Inference on a chain
Inference = marginalization (since the posterior is a marginal given an observation)
\[P(\textbf{x}) = \frac 1 Z ψ_{1, 2}(x_1, x_2) ⋯ ψ_{N1, N}(x_{N1}, x_N)\]⇓
\[P(\textbf{x}) =\sum\limits_{ x_1, ⋯, x_N } p(\textbf{x})\]⟹ computational nightmare
Exercise (cf. Exercise Sheet)
digraph {
rankdir=TB;
m, a > r;
r > i;
}
1. Factorize the BN
\[P(m, a, r, i) = P(m) P(a) P(r \mid m, a) P(i \mid r)\]2. If none of the variables is observed, show that a mosquito bite is independent of an alien abduction. What happens if we observe an itching sensation?

No variable observed: headtohead link ⟹ the path $m ⟶ r ⟶ a$ is blocked ⇒ independence

Itching sensation observed: $r$ and $a$ are not independent wrt to $r$ anymore
3. Consider a particular instance of such a graph. A mosquito bite and an alien abduction might have happened or not (\lbrace 1,0 \rbrace), independently of each other, and with prior probabilities:
\[p(MB = 1) = 0.7\\ p(AA = 1) = 0.1\]Given the state of the MB and AA, a red spot appears with probabilities given by
\[p(RS = 1MB = 1, AA = 1) = 0.8\\ p(RS = 1MB = 1, AA = 0) = 0.7\\ p(RS = 1MB = 0, AA = 1) = 0.4\\ p(RS = 1MB = 0, AA = 0) = 0.1\]a. What is the probability that an alien abduction really happened, if we observe a red spot?
\[P(AA = 1 \mid RS = 1) = \frac{P(RS = 1 \mid AA = 1) P(AA = 1)}{\sum\limits_{ 0 ≤ i, j ≤ 1} P(RS = 1MB = i, AA = j) \underbrace{P(MB = i, AA = j)}_{= P(MB = i) P (AA = j)}} > P(AA=1)\]it’s larger because now we have some evidence
\[P(AA = 1 \mid RS = 1) > P(AA = 1 \mid RS = 1, MB=1) > P(AA=1)\]Factor graph:
graph {
f_a[shape=box];
f_m[shape=box];
f_am[shape=box];
f_r[shape=box];
a  f_a;
m  f_m;
m, a  f_am;
f_am  r;
r  f_r;
f_r  i;
}
Leave a comment