Lecture 1: Bayesian perception
Lecturer: Pantelis Leptourgos
Machine Learning Introduction
Beginning goes back to Alan Turing ⟶ Turing Test (1950)
Can machines do what we can do, as thinking entities?
Supervized learning:
- Task: recognition
- Experience: comparing the predicted label and the true label
- Measure of efficiency: how often does it fail?
Variables:
- Input variables: $\mathbb{x} ∈ ℝ^N$
- Hidden variables: $\mathbb{h} ∈ ℝ^K$
- Output variables: $\mathbb{y} ∈ ℝ^K$
In
-
Supervized learning: $\mathbb{y}$ is given
- Classification: discrete values
- Regression: continuous ones
-
Unsepervized learning: $\mathbb{y}$ not known
- Find patterns in data (ex: clustering, density estimation, dimensionality reduction, etc…)
-
Reinforcement learning: $\mathbb{y}$ is actions ⟶ the system learns from rewards/punishments by acting
When do we not need ML?
⟶ when the relationship between $\mathbb{x}$ and $\mathbb{y}$ is already known/can be analytically solved
Steps of Supervized ML:
-
Pre-processing on Initial Data (feature extraction, etc…)
-
Learn model parameters on Training Set
-
Test the model on the Test Set
-
Make predictions with your model
Challenges in ML and Cognition
Three very well-known problems:
- Playing Chess/Games
- Moving an arm
- Computer vision
⟹ Curse of dimensionality ⟶ to learn, we need a lot of data (often intractable)
⟹ Computer Vision: Going from 2D-images to 3D-representations (need to use some priors)
Ex: Polynomial fitting
-
Minimize the Sum-of-Squares Error Function
-
Beware of under/over-fitting ⟶ over-fitting: you end up fitting the noise (no generalization anymore)
How to get rid of over-fitting?
- Increase the training set
- Remove some outliers
- Increase the data compared to the number of features
- Regularization ⟶ penalize large coefficients values
- Ridge/Lasso regression
How to choose the order of the polynomial?
-
Cross Validation: step between training and test data
-
$BIC$ score: the lower, the better
\[BIC = \ln(n)\underbrace{k}_{\text{number of features}} - 2 ln(\underbrace{\hat{L}}_{\text{likelihood}})\]
Bayesian perception
Bayes Theorem: Indicates how to update our belief
- Sensation: is the detection of external stimulation
- Perception: is how we interpret/integrate these sensations
Why do we need perception? Sensation is not enough, there’s ambiguity everywhere:
- Uncertainty: Noise, Ambiguity
- Cue combination: combination of sensations
- Accumulation of evidence
- Latent variables: things are not directly observable
Core concepts of Bayesian perception:
- Priors: expectations, implicit knowledge
- Sensory data
- the world (what you want to know something about)
- Prediction ⟶ Decision
Goal: go from the sensory data to make predictions about the cause (inversion of the generative model)
3 steps:
- Define generative model
- Beyesian inference
- Computer observer’s estimate distribution
digraph {
rankdir=TB;
X1[label="X"]
X1 -> S[label=" P(S | X)"];
"P(X)"[shape=none];
S -> "X"[label=" P(X | S)"];
}
Likelihood = Noise distribution (why is it called “likelihood”? It’s not a probability, it doesn’t sum to $1$).
Bayesian inference to invert the generative model
Difference between Prior Distribution and the Real Distribution?
⟶ The prior is assumed, based on our beliefs
The Likelihood function is centered at the measurement.
Variance of the likelihood function = reliability of Sensory Evidence (the smaller, the more we trust our measurements).
- The proba of the value of $X$ at the peak is called confidence.
- Uncertainty of the posterior = the variance of the distribution
Decision Criteria
- Maximum Likelihood:
- \[\hat{X}_{ML} = argmax_X (\underbrace{L(X)}_{≝ P(S \mid X)})\]
- Maximum a Posteriori:
- \[\hat{X}_{MAP} = argmax_X (P(X \mid S))\]
Other: Softmax, etc…
- Distribution of MAP estimate:
- \[P(\hat{X}_{MAP} \mid X_{true})\]
NB: It’s Gaussian
Leave a comment