# Lecture 1: Bayesian perception

Lecturer: Pantelis Leptourgos

# Machine Learning Introduction

Beginning goes back to Alan Turing ⟶ Turing Test (1950)

Can machines do what we can do, as thinking entities?

Supervized learning:

• Experience: comparing the predicted label and the true label
• Measure of efficiency: how often does it fail?

Variables:

• Input variables: $\mathbb{x} ∈ ℝ^N$
• Hidden variables: $\mathbb{h} ∈ ℝ^K$
• Output variables: $\mathbb{y} ∈ ℝ^K$

In

• Supervized learning: $\mathbb{y}$ is given

• Classification: discrete values
• Regression: continuous ones
• Unsepervized learning: $\mathbb{y}$ not known

• Find patterns in data (ex: clustering, density estimation, dimensionality reduction, etc…)
• Reinforcement learning: $\mathbb{y}$ is actions ⟶ the system learns from rewards/punishments by acting

When do we not need ML?

⟶ when the relationship between $\mathbb{x}$ and $\mathbb{y}$ is already known/can be analytically solved

Steps of Supervized ML:

1. Pre-processing on Initial Data (feature extraction, etc…)

2. Learn model parameters on Training Set

3. Test the model on the Test Set

4. Make predictions with your model

## Challenges in ML and Cognition

Three very well-known problems:

1. Playing Chess/Games
2. Moving an arm
3. Computer vision

⟹ Curse of dimensionality ⟶ to learn, we need a lot of data (often intractable)

⟹ Computer Vision: Going from 2D-images to 3D-representations (need to use some priors)

Ex: Polynomial fitting

• Minimize the Sum-of-Squares Error Function

• Beware of under/over-fitting ⟶ over-fitting: you end up fitting the noise (no generalization anymore)

How to get rid of over-fitting?

• Increase the training set
• Remove some outliers
• Increase the data compared to the number of features
• Regularization ⟶ penalize large coefficients values
• Ridge/Lasso regression

How to choose the order of the polynomial?

• Cross Validation: step between training and test data

• $BIC$ score: the lower, the better

BIC = \ln(n)\underbrace{k}_{\text{number of features}} - 2 ln(\underbrace{\hat{L}}_{\text{likelihood}})

## Bayesian perception

Bayes Theorem: Indicates how to update our belief

• Sensation: is the detection of external stimulation
• Perception: is how we interpret/integrate these sensations

Why do we need perception? Sensation is not enough, there’s ambiguity everywhere:

• Uncertainty: Noise, Ambiguity
• Cue combination: combination of sensations
• Accumulation of evidence
• Latent variables: things are not directly observable

Core concepts of Bayesian perception:

• Priors: expectations, implicit knowledge
• Sensory data
• the world (what you want to know something about)
• Prediction ⟶ Decision

Goal: go from the sensory data to make predictions about the cause (inversion of the generative model)

3 steps:

1. Define generative model
2. Beyesian inference
3. Computer observer’s estimate distribution
  digraph {
rankdir=TB;
X1[label="X"]
X1 -> S[label="  P(S | X)"];
"P(X)"[shape=none];
S -> "X"[label="  P(X | S)"];
}


Likelihood = Noise distribution (why is it called “likelihood”? It’s not a probability, it doesn’t sum to $1$).

Bayesian inference to invert the generative model

Difference between Prior Distribution and the Real Distribution?

⟶ The prior is assumed, based on our beliefs

The Likelihood function is centered at the measurement.

Variance of the likelihood function = reliability of Sensory Evidence (the smaller, the more we trust our measurements).

• The proba of the value of $X$ at the peak is called confidence.
• Uncertainty of the posterior = the variance of the distribution

### Decision Criteria

Maximum Likelihood:
\hat{X}_{ML} = argmax_X (\underbrace{L(X)}_{≝ P(S \mid X)})
Maximum a Posteriori:
\hat{X}_{MAP} = argmax_X (P(X \mid S))

Other: Softmax, etc…

Distribution of MAP estimate:
P(\hat{X}_{MAP} \mid X_{true})

NB: It’s Gaussian

Tags:

Updated: