Lecture 1: Bayesian perception
Lecturer: Pantelis Leptourgos
Machine Learning Introduction
Beginning goes back to Alan Turing ⟶ Turing Test (1950)
Can machines do what we can do, as thinking entities?
Supervized learning:
 Task: recognition
 Experience: comparing the predicted label and the true label
 Measure of efficiency: how often does it fail?
Variables:
 Input variables: $\mathbb{x} ∈ ℝ^N$
 Hidden variables: $\mathbb{h} ∈ ℝ^K$
 Output variables: $\mathbb{y} ∈ ℝ^K$
In

Supervized learning: $\mathbb{y}$ is given
 Classification: discrete values
 Regression: continuous ones

Unsepervized learning: $\mathbb{y}$ not known
 Find patterns in data (ex: clustering, density estimation, dimensionality reduction, etc…)

Reinforcement learning: $\mathbb{y}$ is actions ⟶ the system learns from rewards/punishments by acting
When do we not need ML?
⟶ when the relationship between $\mathbb{x}$ and $\mathbb{y}$ is already known/can be analytically solved
Steps of Supervized ML:

Preprocessing on Initial Data (feature extraction, etc…)

Learn model parameters on Training Set

Test the model on the Test Set

Make predictions with your model
Challenges in ML and Cognition
Three very wellknown problems:
 Playing Chess/Games
 Moving an arm
 Computer vision
⟹ Curse of dimensionality ⟶ to learn, we need a lot of data (often intractable)
⟹ Computer Vision: Going from 2Dimages to 3Drepresentations (need to use some priors)
Ex: Polynomial fitting

Minimize the SumofSquares Error Function

Beware of under/overfitting ⟶ overfitting: you end up fitting the noise (no generalization anymore)
How to get rid of overfitting?
 Increase the training set
 Remove some outliers
 Increase the data compared to the number of features
 Regularization ⟶ penalize large coefficients values
 Ridge/Lasso regression
How to choose the order of the polynomial?

Cross Validation: step between training and test data

$BIC$ score: the lower, the better
\[BIC = \ln(n)\underbrace{k}_{\text{number of features}}  2 ln(\underbrace{\hat{L}}_{\text{likelihood}})\]
Bayesian perception
Bayes Theorem: Indicates how to update our belief
 Sensation: is the detection of external stimulation
 Perception: is how we interpret/integrate these sensations
Why do we need perception? Sensation is not enough, there’s ambiguity everywhere:
 Uncertainty: Noise, Ambiguity
 Cue combination: combination of sensations
 Accumulation of evidence
 Latent variables: things are not directly observable
Core concepts of Bayesian perception:
 Priors: expectations, implicit knowledge
 Sensory data
 the world (what you want to know something about)
 Prediction ⟶ Decision
Goal: go from the sensory data to make predictions about the cause (inversion of the generative model)
3 steps:
 Define generative model
 Beyesian inference
 Computer observer’s estimate distribution
digraph {
rankdir=TB;
X1[label="X"]
X1 > S[label=" P(S  X)"];
"P(X)"[shape=none];
S > "X"[label=" P(X  S)"];
}
Likelihood = Noise distribution (why is it called “likelihood”? It’s not a probability, it doesn’t sum to $1$).
Bayesian inference to invert the generative model
Difference between Prior Distribution and the Real Distribution?
⟶ The prior is assumed, based on our beliefs
The Likelihood function is centered at the measurement.
Variance of the likelihood function = reliability of Sensory Evidence (the smaller, the more we trust our measurements).
 The proba of the value of $X$ at the peak is called confidence.
 Uncertainty of the posterior = the variance of the distribution
Decision Criteria
 Maximum Likelihood:
 \[\hat{X}_{ML} = argmax_X (\underbrace{L(X)}_{≝ P(S \mid X)})\]
 Maximum a Posteriori:
 \[\hat{X}_{MAP} = argmax_X (P(X \mid S))\]
Other: Softmax, etc…
 Distribution of MAP estimate:
 \[P(\hat{X}_{MAP} \mid X_{true})\]
NB: It’s Gaussian
Leave a comment