II.¶

1.¶

a).¶

In the general case:

$$ℛ(f) =𝔼_{X,Y}(l(f(X), Y))$$

For a quadratic loss:

$$ℛ(f) =𝔼_{X,Y}(f(X) - Y)^2$$

Machine Learning model:

data: $(x_i, \underbrace{y_i}_{\text{label}})_i$
goal: find an $f$ s.t. $$f(X) = Y$$

b).¶

$$F^\ast = {\rm argmin}_{f ∈ F} ℛ(f)$$

Model:

The "real" function:

$$Y = f(X) + ε$$

$$ℛ(f) = 𝔼_{X,Y} \Big[ (f(X) - Y)^2 \Big] \\ = 𝔼_{X,Y} \Big[ f(X)^2 + g(X)^2 + ε^2 + 2 g(X) ε - 2 f(X) g(X) - 2 f(X) ε\Big] \\ = σ^2 + 𝔼_{x,y} (f(X) - g(X))^2$$

Since the gaussian noise is centered and independent of $f(X), g(X)$

So $f^\ast = g$ if $F ⊆ 𝒞^∞([0,1])$ as $g = \exp(3 \bullet) ∈ F$

import plotly
import plotly.plotly as py
import plotly.graph_objs as go
%pylab inline

Populating the interactive namespace from numpy and matplotlib

plotly.offline.init_notebook_mode()

Model: $Y = g(X) + ε$

$X \leadsto 𝒰([0,1])$
$ε = 𝒩(0,1)$
$g(x) = \exp(3x)$

g = lambda x: exp(3 * x)

eps = randn(40)
x = rand(40)





scatter(x,g(x)+eps)

x = linspace(0, 1, 100)
y = g(x)

plot(x,y)

[<matplotlib.lines.Line2D at 0x114f23898>]

nb_pts = 40
train_ind = range(0, int(0.5* nb_pts))
test_ind = range(int(0.5*nb_pts), nb_pts)

3).¶

$$\widehat{ℛ}_n(f) = \frac 1 n \sum_{i=1}^n (f(x_i) - y_i)^2$$

$$\min_{f∈S} ℛ(f) = 𝔼_{X,Y} (f(x) - y)^2$$

On ne connaît pas les lois de $X$ et $Y$:

on doit utiliser les données
on min. le risque empirique

$$\min_{f∈S} \widehat{ℛ}_n(f) = \frac 1 n \sum_{i=1}^n (f(x_i) - y_i)^2 \\ ⇔ \min_{θ_0, θ_1)∈ℝ^2} \frac 1 n \sum_{i=1}^n (θ_0 + θ_1 x_i - y_i)^2$$

$ℛ_n$ est convexe, donc elle atteint son minimum en $(θ_0, θ_1)$ ssi $\nabla ℛ_n(θ_0, θ_1) = 0$

i.e. ssi

$$\begin{cases} n θ_0 + θ_1 \sum x_i - \sum y_i = 0 \\ θ_0 \sum x_i - θ_1 \sum x_i^2 - \sum x_i y_i = 0 \end{cases} ⇔ (X^T X)θ = X^T Y$$

où

$θ = (θ_0, θ_1)^T$
$Y = (y_1, ⋯, y_n)^T$
$X = \begin{pmatrix} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n \end{pmatrix} ∈ ℳ_{2, n}$

Donc

$$\widehat{ℛ}_n(θ_0, θ_1) = \frac 1 n \sum (θ_0 + θ_1 x_i - y_i)^2 \\ = \frac 1 n \Vert X θ - Y \Vert^2$$

$$\nabla \widehat{ℛ}_n = 2 X^T (X θ - Y)$$

d'où

$$\nabla \widehat{ℛ}_n = 0 ⇔ X^T X θ = X^T Y$$

Pour $k$:

$θ = (θ_0, θ_1)^T$
$Y = (y_1, ⋯, y_n)^T$
$X_k ≝ \begin{pmatrix} 1 & x_1 & ⋯ & x_1^k \\ \vdots & & & \vdots \\ 1 & x_n & ⋯ & x_n^k \end{pmatrix} ∈ ℳ_{k+1, n}$

Donc résoudre

$$X_k^T X_k θ_k = X_k^T Y$$

$\widehat{θ}_k$ ⟶ tracer $y = \widehat{θ}_k X_k$

Puis: pour chaque $k$ on calcule $\widehat{R}_n(f_k)$ sur le train / le test.