Lecture 2: Regression
Linear Regression and Logistic regression: particular cases of empirical risk regression
I. Linear regression
Ex:
- EDF: understand what’s the relation between consumption of electricity and the weather (the colder it is, the more electricity is consumed)
Goal: find
Beware over-fitting!
⟹
Linear model:
where the noise , i.i.d
Assumption:
(
- Ordinary least square estimator:
-
Proposition:
Proof:
So
But as
We found the best approximation of
Moreover:
Is
To simplify calculations, we assume that
So
What is the prediction risk of ?
But
Independence:
-
and are independent -
and are deterministic -
is independent of (only depends on )
So
As
Th (Gauss-Markov):
is optimal in the sense that its variance is minimal among all linear unbiased estimators.
Can we estimate
But
So this estimator is biased:
So we define the following unbiased estimator :
The case of Gaussian noise
NB: this assumption is legitimate, because of the central limit theorem: noises are i.i.d
The maximum likelihood estimator of
What if the relationship is not linear?
We fit any polynomial by adding transformation of the coordinates into
⟶ Spline regression
What if is not invertible ?
But in practice, we rather use:
- Regularisation:
-
For each
- Lasso:
-
The choice of
is important
QR decomposition
⟶ QR decomposition:
so that:
Gradient Descent
If you choose
Stochastic gradient descent: if you don’t want to compute the gradient entirely.
Logistic regression
- Binary Classification: where
⟶ square loss for reals ⟹ not good here
Issue: it’s not convex, so minimization problem way too hard! (NP-hard)
Heaviside function (not convex) replaced by a logistic loss function (which is convex)
- Logistic loss:
-
Then we predict
Same trick than before if the data can’t be split by a linear function ⟶ transformation of the coordinates into
Nice probabilistic interpretation of logistic regression:
must be the number of times you’re more likely to be in a category than in the other
So we want the log to be linear:
With logistic regression, we cannot compute
Leave a comment