In the general case:
$$ℛ(f) =𝔼_{X,Y}(l(f(X), Y))$$
For a quadratic loss:
$$ℛ(f) =𝔼_{X,Y}(f(X) - Y)^2$$
Machine Learning model:
$$F^\ast = {\rm argmin}_{f ∈ F} ℛ(f)$$
Model:
The "real" function:
$$Y = f(X) + ε$$
$$ℛ(f) = 𝔼_{X,Y} \Big[ (f(X) - Y)^2 \Big] \\ = 𝔼_{X,Y} \Big[ f(X)^2 + g(X)^2 + ε^2 + 2 g(X) ε - 2 f(X) g(X) - 2 f(X) ε\Big] \\ = σ^2 + 𝔼_{x,y} (f(X) - g(X))^2$$
Since the gaussian noise is centered and independent of $f(X), g(X)$
So $f^\ast = g$ if $F ⊆ 𝒞^∞([0,1])$ as $g = \exp(3 \bullet) ∈ F$
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
%pylab inline
plotly.offline.init_notebook_mode()
Model: $Y = g(X) + ε$
g = lambda x: exp(3 * x)
eps = randn(40)
x = rand(40)
scatter(x,g(x)+eps)
x = linspace(0, 1, 100)
y = g(x)
plot(x,y)
nb_pts = 40
train_ind = range(0, int(0.5* nb_pts))
test_ind = range(int(0.5*nb_pts), nb_pts)
$$\widehat{ℛ}_n(f) = \frac 1 n \sum_{i=1}^n (f(x_i) - y_i)^2$$
$$\min_{f∈S} ℛ(f) = 𝔼_{X,Y} (f(x) - y)^2$$
On ne connaît pas les lois de $X$ et $Y$:
$$\min_{f∈S} \widehat{ℛ}_n(f) = \frac 1 n \sum_{i=1}^n (f(x_i) - y_i)^2 \\ ⇔ \min_{θ_0, θ_1)∈ℝ^2} \frac 1 n \sum_{i=1}^n (θ_0 + θ_1 x_i - y_i)^2$$
$ℛ_n$ est convexe, donc elle atteint son minimum en $(θ_0, θ_1)$ ssi $\nabla ℛ_n(θ_0, θ_1) = 0$
i.e. ssi
$$\begin{cases} n θ_0 + θ_1 \sum x_i - \sum y_i = 0 \\ θ_0 \sum x_i - θ_1 \sum x_i^2 - \sum x_i y_i = 0 \end{cases} ⇔ (X^T X)θ = X^T Y$$
où
Donc
$$\widehat{ℛ}_n(θ_0, θ_1) = \frac 1 n \sum (θ_0 + θ_1 x_i - y_i)^2 \\ = \frac 1 n \Vert X θ - Y \Vert^2$$
$$\nabla \widehat{ℛ}_n = 2 X^T (X θ - Y)$$
d'où
$$\nabla \widehat{ℛ}_n = 0 ⇔ X^T X θ = X^T Y$$
Pour $k$:
Donc résoudre
$$X_k^T X_k θ_k = X_k^T Y$$
$\widehat{θ}_k$ ⟶ tracer $y = \widehat{θ}_k X_k$
Puis: pour chaque $k$ on calcule $\widehat{R}_n(f_k)$ sur le train / le test.