Practical session 10: LASSO Regression
TP 10
\[\min_{β∈ ℝ^d} \frac 1 2 \sum\limits_{ i } (y_i - β^T x_i)^2 + λ \Vert β \Vert_1\] \[\min_{β∈ ℝ^d} \frac 1 2 \Vert Y - Xβ \Vert^2 + λ \Vert β \Vert_1\]1.
Assume that $X^T X = I$
NB: $\overline{β}$ result of the OLS regression:
\[\overline{β} = (X^T X)^{-1} X^T Y \\ = X^T Y\] \[\begin{align*} \Vert Y - X β \Vert^2 & = \Vert Y \Vert^2 + \Vert Xβ \Vert^2 - 2 (Xβ)^T Y \\ & = \Vert Y \Vert^2 + β^T β - 2 β^T \overline{β} \end{align*}\]So
\[\begin{align*} minimize_{β∈ ℝ^d} \frac 1 2 \Vert Y - Xβ \Vert^2 + λ \Vert β \Vert_1 & ⟺ minimize_{β∈ ℝ^d} \frac 1 2 β^T β - β^T \overline{β} + λ \Vert β \Vert_1 \\ & ⟺ minimize_{β∈ ℝ^d} \frac 1 2 \sum\limits_{ j=1 }^d β_j^2 - \sum\limits_{ j=1 }^d β_j \overline{β}_j + λ \sum\limits_{ j=1 }^d \vert β_j \vert \\ & ⟺ minimize_{β∈ ℝ^d} \sum\limits_{ j=1 }^d \underbrace{β_j\left(\frac 1 2 β_j - \overline{β}_j\right) + λ \vert β_j \vert}_{≝ f_j(β_j)} \\ \end{align*}\]And it suffices to minimize $f_j$ for all $j$
Problem: $x ⟼ \vert x \vert$ is not differentiable at $x = 0$
Trick: note that $β_j^\ast$ has the same sign as $\overline{β}_j$
Indeed: one wants to minimize $\frac 1 2 β_j^2 - \overline{β}_j β_j + λ \vert β_j \vert$, so $\overline{β}_j β_j$ has to be non-negative
So the $f_j$ are defined on $ℝ^\ast_+$ and $ℝ^\ast_-$, on which they are differentiable.
Case 1: $\overline{β}_j, β^\ast_j > 0$
\[f_j'(β_j) = β_j - \overline{β}_j + λ\]so that
\[β^\ast_j = \max\left(\overline{β}_j - λ, 0\right) = \left(\overline{β}_j - λ\right)_+\]Case 2: $\overline{β}_j, β^\ast_j < 0$
\[f_j'(β_j) = β_j - \overline{β}_j - λ\] \[β^\ast_j = \min\left(\overline{β}_j + λ, 0\right) = -\max\left(-\overline{β}_j - λ, 0\right)\\ = -\left(\vert \overline{β}_j \vert - λ \right)_+ \\\]In either case,
\[β^\ast_j = sign(\overline{β}_j) \left(\vert \overline{β}_j \vert - λ \right)_+\]
2. General case
\[\min_{β∈ ℝ^d} \frac 1 2 \sum\limits_{ i } \underbrace{(y_i - β^T x_i)^2}_{\text{Residual Sum of Squared}} + \underbrace{λ \Vert β \Vert_1}_{\text{reg}}\] \[\nabla \text{RSS} = - X^T (Y - X β)\] \[\begin{align*} \frac{\partial \text{RSS}}{\partial β_j} & = j\text{-th coord of } \nabla\text{RSS}\\ & = - \sum\limits_{ i=1 }^n x_{i, j} (y_i - β^T x_i) \\ & = \underbrace{-\sum\limits_{ i=1 }^n x_{i,j} y_i + \sum\limits_{ i=1 }^n x_{i,j} \left(\sum\limits_{ k≠j } β_k x_{ik}\right)}_{≝ \; ρ_j} + \underbrace{\sum\limits_{ i=1 }^n x_{ij}^2}_{≝ \; r_j} β_j \end{align*}\]Let us denote by $F$ the function to minimize (RSS+reg).
One cannot analytically resolve $\nabla F = 0$.
NB: If
\[f(x) = \vert x \vert\]Then one sets:
\[\partial f(x) ≝ \begin{cases} -1 &&\text{ if } x<0 \\ [-1, 1] &&\text{ if } x=0\\ 1 &&\text{ else if } x >0 \end{cases}\]\[\frac{\partial F}{\partial β_j}(β_j) = ρ_j + r_j β_j + \begin{cases} -λ &&\text{ if } β_j<0 \\ [-λ, λ] &&\text{ if } β_j=0\\ λ &&\text{ else if } β_j >0 \end{cases} \\ = \begin{cases} ρ_j + r_j β_j -λ &&\text{ if } β_j<0 \\ [ρ_j-λ, ρ_j+λ] &&\text{ if } β_j=0\\ ρ_j + r_j β_j +λ &&\text{ else if } β_j >0 \end{cases}\]
3.
Case 1: $β_j < 0$
\[FOC \text{ (First Order Condition)} ⟹ β_j = \frac{-ρ_j + λ}{r_j}\]But $β_j < 0$, so \(ρ_j > λ\)
Case 2: $β_j > 0$
\[FOC ⟹ β_j = -\frac{ρ_j + λ}{r_j}\]But $β_j > 0$, so \(ρ_j < -λ\)
Case 3: $β_j = 0$
\[FOC ⟹ 0 ∈ [ρ_j-λ, ρ_j+λ]\]So \(ρ_j ∈ [-λ, λ]\)
\[β_j = \underbrace{\begin{cases} -\frac{ρ_j + λ}{r_j} &&\text{ if } ρ_j<-λ \\ 0 &&\text{ if } -λ ≤ ρ_j ≤ λ\\ \frac{-ρ_j + λ}{r_j} &&\text{ else if } ρ_j > λ \end{cases}}_{≝ \underbrace{st}_{\rlap{\text{soft thresholding}}}\left(\frac{ρ_j}{r_j}, \; λ\right)}\]
4. Algorithm
One iteratively minimizes on each coordinate for each $j$, one sets $(β_k)_{k≠j}$ and minimizes $F(β)$ on the j-th coordinate.
Stopping criterion: $β - β_{new} < ε$
compute r_j for all j
while "not converged":
for j=1, ..., d:
compute ρ_j
β_j = st(ρ_j/r_j, λ)
- As \(\frac{\partial \text{RSS}}{\partial β_j} = - \sum\limits_{ i } y_i x_{ij} + \sum\limits_{ i } x_{ij} \Big(\sum\limits_{ k } β_k x_{ik}\Big)\) for $β^{OLS}$: \(β^{OLS}_j = -\frac{ρ_j}{r_j}\)
- \[r_j ≝ \sum\limits_{ i } x_{ij}^2\]
- \[ρ_j ≝ -\sum\limits_{ i=1 }^n x_{i,j} y_i + \sum\limits_{ i=1 }^n x_{i,j} \left(\sum\limits_{ k≠j } β_k x_{ik}\right)\]
Leave a comment