Lecture 6: Graphical Models and Inference
Lecturer: Pantelis Leptourgos
Dominant theories for the brain:
- ⟶ bayesian brain hypothesis based on these ideas - same for hidden Markov models
- sampling hypothesis for brain (the brain uses samples to approximate to posterior)
- predictive coding theory: the brain update based on the discrepancy between the evidence and the prediction error
Generative models - Inference
You can do anything you want by just using the graphical model. But let us recall the inference generative model.
- Bayes theorem:
-
posterior $\propto$ likelihood × prior
We only have some low-level evidence about the objects in the world (sensory data): based on that, the brain tries to predict what could have caused this input, by creating and internal model ⇒ generative model (learnt by the brain). Then the brain does inference, i.e. inverse this model
Today, we’ll focus on the graphical representation of this graphical model:
digraph {
rankdir=TB;
X -> S[label=" P(S | X)"];
}
You can represent a whole bunch of problems with graphs, as above.
Graphical model: it’s a detailed representation of the joint probability:
\[P(X, S) = P(S \mid X) P(X)\]Graphical models:
- Bayesian networks
- Markov Random Fields
- Factor graphs
Probabilistic Graphical Models
- Graphical model:
-
it’s a graph, whose nodes (= variables) and edges represent statistical dependencies.
NB: you can represent any distribution as a graphical model.
- Conjugate prior:
-
when multiplied by the likelihood, the posterior is of the same “kind” than the prior (ex: Guaussian distributions).
NB: we’ll most often use Gaussian and Discrete random variables.
Why are they useful?
- for better visualization
-
properties of joint distribution/computations made easier
- when it comes to computation: used wisely: graphical models can make you go from exponential computations to linear ones
- biologically plausible solutions
Graphical models: Bayesian networks (BN)
Directed Acyclic Graphs (DAG) representing causality where
\[x_1 ⟶ x_2\]means that $x_1$ causes $x_2$
Warning!: you mustn’t have loops! (otherwise: circular argument)
Ex: used for generative models
Constructing a Bayesian Network:
\[P(a, b, c) = P(c \mid a, b) P(b \mid a) P(a)\] digraph {
rankdir=LR;
a -> b, c;
b -> c;
}
NB:
- it’s indeed acyclic
- we could have used a different factorization
- intersting properties: when we start removing links
Factorization
Given a BN:
\[p(\textbf{x}) = \prod\limits_{ k=1 }^K p(x_k \mid \underbrace{pa_k}_{\text{parents}})\]The problem with fully connected graphs is that they have no intersting property. If you remove some links:
- you restrict the class of distributions
- you reduce the number of parameters
Ex:
digraph {
rankdir=LR;
x_1 -> x_2;
}
Fractorization: \(\underbrace{P(x_1 \mid x_2)}_{K_1 (K_2 - 1)}\underbrace{P(x_2)}_{K_2} = P(x_1, x_2) ⟶ K_1 (K_2 - 1)+K_2 \text{ parameters}\)
digraph {
rankdir=LR;
x_1; x_2;
}
Fractorization: \(\underbrace{P(x_1)}_{K_1}\underbrace{P(x_2)}_{K_2} ⟶ K_1+K_2 \text{ parameters}\)
Likewise:
-
Fully connected graph with $M$ variables: $K^M - 1$ parameters
-
Chain $x_1 ⟶ ⋯ ⟶ x_M$: $O(K)$ parameters
Conditional independence
Removing links introduces conditional independences:
EX1:
\[P(a, b \mid c) = P(a \mid c) P(b \mid c) ⟶ \text{ denoted by } a ⊥ b \mid c\] digraph {
rankdir=TB;
c -> a, b;
}
\[P(a, b, c) = P(a \mid c) P(b \mid c) P(c)\]
Are $a$ and $b$ independent? Not in general.
But for a given $c$, they are conditionall independent: $P(a, b\mid c) P(c) = P(a, b, c) = P(a \mid c) P(b \mid c) P(c)$
EX2:
digraph {
rankdir=LR;
a -> c -> b;
}
\[P(a, b, c) = P(a) P(c \mid a) P(b \mid c)\]
Are $a$ and $b$ independent? No:
\[P(a, b) = \sum\limits_{ c } P(a, b, c) = P(a) \sum\limits_{ c } P(c \mid a) P(b \mid c) = P(a) P(b \mid a)\]Is there independence for a fixed $c$? Yes:
\[P(a, b \mid c) = \frac{P(a, b, c)}{P(c)} = \frac{P(a) P(c\mid a) P(b \mid c)}{P(c)} = P(a \mid c) P(b \mid c)\]Ex: $a$= tree, $c$=leaf, $c$=green
EX3:
digraph {
rankdir=LR;
a -> c;
b -> c;
}
-
$a$ and $b$ are independent
-
For a fixed $c$: $a$ and $b$ become dependent with repsect to $c$
⟹ D-separation theorem
Notion of Markov Blanket
Graphical models: Markov Random Fields
Undirected Graphs where you represent soft-constraints:
\[x_1 - x_2\]knowing $x_1$ incur a constraint on $x_2$
We have theorems analogous to BN.
\[p(\textbf{x}) = \frac 1 Z \prod\limits_{ \text{maximal clique } C} \underbrace{ ψ_C(x_C)}_{\exp(-E(x_C))}\]Ex: in computer vision: image denoising
Your MRF is a graph where each node is a pixel of the original image, on top of which you have the noisy image
Link with BN
\[P(\textbf{x}) = P(x_1) P(x_2 \mid x_1) ⋯ P(x_N \mid x_{N-1})\]becomes
\[P(\textbf{x}) = \frac 1 Z ψ_{1, 2}(x_1, x_2) ⋯ ψ_{N-1, N}(x_{N-1}, x_N)\]Inference: message passing algorithms
Inference on a chain
Inference = marginalization (since the posterior is a marginal given an observation)
\[P(\textbf{x}) = \frac 1 Z ψ_{1, 2}(x_1, x_2) ⋯ ψ_{N-1, N}(x_{N-1}, x_N)\]⇓
\[P(\textbf{x}) =\sum\limits_{ x_1, ⋯, x_N } p(\textbf{x})\]⟹ computational nightmare
Exercise (cf. Exercise Sheet)
digraph {
rankdir=TB;
m, a -> r;
r -> i;
}
1. Factorize the BN
\[P(m, a, r, i) = P(m) P(a) P(r \mid m, a) P(i \mid r)\]2. If none of the variables is observed, show that a mosquito bite is independent of an alien abduction. What happens if we observe an itching sensation?
-
No variable observed: head-to-head link ⟹ the path $m ⟶ r ⟶ a$ is blocked ⇒ independence
-
Itching sensation observed: $r$ and $a$ are not independent wrt to $r$ anymore
3. Consider a particular instance of such a graph. A mosquito bite and an alien abduction might have happened or not (\lbrace 1,0 \rbrace), independently of each other, and with prior probabilities:
\[p(MB = 1) = 0.7\\ p(AA = 1) = 0.1\]Given the state of the MB and AA, a red spot appears with probabilities given by
\[p(RS = 1|MB = 1, AA = 1) = 0.8\\ p(RS = 1|MB = 1, AA = 0) = 0.7\\ p(RS = 1|MB = 0, AA = 1) = 0.4\\ p(RS = 1|MB = 0, AA = 0) = 0.1\]a. What is the probability that an alien abduction really happened, if we observe a red spot?
\[P(AA = 1 \mid RS = 1) = \frac{P(RS = 1 \mid AA = 1) P(AA = 1)}{\sum\limits_{ 0 ≤ i, j ≤ 1} P(RS = 1|MB = i, AA = j) \underbrace{P(MB = i, AA = j)}_{= P(MB = i) P (AA = j)}} > P(AA=1)\]it’s larger because now we have some evidence
\[P(AA = 1 \mid RS = 1) > P(AA = 1 \mid RS = 1, MB=1) > P(AA=1)\]Factor graph:
graph {
f_a[shape=box];
f_m[shape=box];
f_am[shape=box];
f_r[shape=box];
a -- f_a;
m -- f_m;
m, a -- f_am;
f_am -- r;
r -- f_r;
f_r -- i;
}
Leave a comment