Lecture 1: Bayesian perception

Lecturer: Pantelis Leptourgos

Machine Learning Introduction

Beginning goes back to Alan Turing ⟶ Turing Test (1950)

Can machines do what we can do, as thinking entities?

Supervized learning:

Task: recognition
Experience: comparing the predicted label and the true label
Measure of efficiency: how often does it fail?

Variables:

Input variables: $\mathbb{x} ∈ ℝ^N$
Hidden variables: $\mathbb{h} ∈ ℝ^K$
Output variables: $\mathbb{y} ∈ ℝ^K$

Supervized learning: $\mathbb{y}$ is given
- Classification: discrete values
- Regression: continuous ones
Unsepervized learning: $\mathbb{y}$ not known
- Find patterns in data (ex: clustering, density estimation, dimensionality reduction, etc…)
Reinforcement learning: $\mathbb{y}$ is actions ⟶ the system learns from rewards/punishments by acting

When do we not need ML?

⟶ when the relationship between $\mathbb{x}$ and $\mathbb{y}$ is already known/can be analytically solved

Steps of Supervized ML:

Pre-processing on Initial Data (feature extraction, etc…)
Learn model parameters on Training Set
Test the model on the Test Set
Make predictions with your model

Challenges in ML and Cognition

Three very well-known problems:

Playing Chess/Games
Moving an arm
Computer vision

⟹ Curse of dimensionality ⟶ to learn, we need a lot of data (often intractable)

⟹ Computer Vision: Going from 2D-images to 3D-representations (need to use some priors)

Ex: Polynomial fitting

Minimize the Sum-of-Squares Error Function
Beware of under/over-fitting ⟶ over-fitting: you end up fitting the noise (no generalization anymore)

How to get rid of over-fitting?

Increase the training set
Remove some outliers
Increase the data compared to the number of features
Regularization ⟶ penalize large coefficients values
- Ridge/Lasso regression

How to choose the order of the polynomial?

Cross Validation: step between training and test data
$BIC$ score: the lower, the better
\[BIC = \ln(n)\underbrace{k}_{\text{number of features}} - 2 ln(\underbrace{\hat{L}}_{\text{likelihood}})\]

Bayesian perception

Bayes Theorem: Indicates how to update our belief

Sensation: is the detection of external stimulation
Perception: is how we interpret/integrate these sensations

Why do we need perception? Sensation is not enough, there’s ambiguity everywhere:

Uncertainty: Noise, Ambiguity
Cue combination: combination of sensations
Accumulation of evidence
Latent variables: things are not directly observable

Core concepts of Bayesian perception:

Priors: expectations, implicit knowledge
Sensory data
the world (what you want to know something about)
Prediction ⟶ Decision

Goal: go from the sensory data to make predictions about the cause (inversion of the generative model)

3 steps:

Define generative model
Beyesian inference
Computer observer’s estimate distribution

  digraph {
    rankdir=TB;
    X1[label="X"]
    X1 -> S[label="  P(S | X)"];
    "P(X)"[shape=none];
    S -> "X"[label="  P(X | S)"];
  }

Likelihood = Noise distribution (why is it called “likelihood”? It’s not a probability, it doesn’t sum to $1$).

Bayesian inference to invert the generative model

Difference between Prior Distribution and the Real Distribution?

⟶ The prior is assumed, based on our beliefs

The Likelihood function is centered at the measurement.

Variance of the likelihood function = reliability of Sensory Evidence (the smaller, the more we trust our measurements).

The proba of the value of $X$ at the peak is called confidence.
Uncertainty of the posterior = the variance of the distribution

Decision Criteria

Maximum Likelihood:: \[\hat{X}_{ML} = argmax_X (\underbrace{L(X)}_{≝ P(S \mid X)})\]
Maximum a Posteriori:: \[\hat{X}_{MAP} = argmax_X (P(X \mid S))\]

Other: Softmax, etc…

Distribution of MAP estimate:: \[P(\hat{X}_{MAP} \mid X_{true})\]

NB: It’s Gaussian

Share on

Twitter Facebook Google+ LinkedIn

Machine Learning Introduction

Challenges in ML and Cognition

Bayesian perception

Decision Criteria

Share on

Leave a comment