Practical session 10: LASSO Regression

Link

TP 10

\min_{β∈ ℝ^d} \frac 1 2 \sum\limits_{ i } (y_i - β^T x_i)^2 + λ \Vert β \Vert_1
\min_{β∈ ℝ^d} \frac 1 2 \Vert Y - Xβ \Vert^2 + λ \Vert β \Vert_1

1.

Assume that $X^T X = I$

NB: $\overline{β}$ result of the OLS regression:

\overline{β} = (X^T X)^{-1} X^T Y \\ = X^T Y
\begin{align*} \Vert Y - X β \Vert^2 & = \Vert Y \Vert^2 + \Vert Xβ \Vert^2 - 2 (Xβ)^T Y \\ & = \Vert Y \Vert^2 + β^T β - 2 β^T \overline{β} \end{align*}

So

\begin{align*} minimize_{β∈ ℝ^d} \frac 1 2 \Vert Y - Xβ \Vert^2 + λ \Vert β \Vert_1 & ⟺ minimize_{β∈ ℝ^d} \frac 1 2 β^T β - β^T \overline{β} + λ \Vert β \Vert_1 \\ & ⟺ minimize_{β∈ ℝ^d} \frac 1 2 \sum\limits_{ j=1 }^d β_j^2 - \sum\limits_{ j=1 }^d β_j \overline{β}_j + λ \sum\limits_{ j=1 }^d \vert β_j \vert \\ & ⟺ minimize_{β∈ ℝ^d} \sum\limits_{ j=1 }^d \underbrace{β_j\left(\frac 1 2 β_j - \overline{β}_j\right) + λ \vert β_j \vert}_{≝ f_j(β_j)} \\ \end{align*}

And it suffices to minimize $f_j$ for all $j$

Problem: $x ⟼ \vert x \vert$ is not differentiable at $x = 0$

Trick: note that $β_j^\ast$ has the same sign as $\overline{β}_j$

Indeed: one wants to minimize $\frac 1 2 β_j^2 - \overline{β}_j β_j + λ \vert β_j \vert$, so $\overline{β}_j β_j$ has to be non-negative

So the $f_j$ are defined on $ℝ^\ast_+$ and $ℝ^\ast_-$, on which they are differentiable.

Case 1: $\overline{β}_j, β^\ast_j > 0$

f_j'(β_j) = β_j - \overline{β}_j + λ

so that

β^\ast_j = \max\left(\overline{β}_j - λ, 0\right) = \left(\overline{β}_j - λ\right)_+

Case 2: $\overline{β}_j, β^\ast_j < 0$

f_j'(β_j) = β_j - \overline{β}_j - λ
β^\ast_j = \min\left(\overline{β}_j + λ, 0\right) = -\max\left(-\overline{β}_j - λ, 0\right)\\ = -\left(\vert \overline{β}_j \vert - λ \right)_+ \\

In either case,

β^\ast_j = sign(\overline{β}_j) \left(\vert \overline{β}_j \vert - λ \right)_+

2. General case

\min_{β∈ ℝ^d} \frac 1 2 \sum\limits_{ i } \underbrace{(y_i - β^T x_i)^2}_{\text{Residual Sum of Squared}} + \underbrace{λ \Vert β \Vert_1}_{\text{reg}}
\nabla \text{RSS} = - X^T (Y - X β)
\begin{align*} \frac{\partial \text{RSS}}{\partial β_j} & = j\text{-th coord of } \nabla\text{RSS}\\ & = - \sum\limits_{ i=1 }^n x_{i, j} (y_i - β^T x_i) \\ & = \underbrace{-\sum\limits_{ i=1 }^n x_{i,j} y_i + \sum\limits_{ i=1 }^n x_{i,j} \left(\sum\limits_{ k≠j } β_k x_{ik}\right)}_{≝ \; ρ_j} + \underbrace{\sum\limits_{ i=1 }^n x_{ij}^2}_{≝ \; r_j} β_j \end{align*}

Let us denote by $F$ the function to minimize (RSS+reg).

One cannot analytically resolve $\nabla F = 0$.


NB: If

f(x) = \vert x \vert

Then one sets:

\partial f(x) ≝ \begin{cases} -1 &&\text{ if } x<0 \\ [-1, 1] &&\text{ if } x=0\\ 1 &&\text{ else if } x >0 \end{cases}

\frac{\partial F}{\partial β_j}(β_j) = ρ_j + r_j β_j + \begin{cases} -λ &&\text{ if } β_j<0 \\ [-λ, λ] &&\text{ if } β_j=0\\ λ &&\text{ else if } β_j >0 \end{cases} \\ = \begin{cases} ρ_j + r_j β_j -λ &&\text{ if } β_j<0 \\ [ρ_j-λ, ρ_j+λ] &&\text{ if } β_j=0\\ ρ_j + r_j β_j +λ &&\text{ else if } β_j >0 \end{cases}

3.

Case 1: $β_j < 0$

FOC \text{ (First Order Condition)} ⟹ β_j = \frac{-ρ_j + λ}{r_j}

But $β_j < 0$, so ρ_j > λ

Case 2: $β_j > 0$

FOC ⟹ β_j = -\frac{ρ_j + λ}{r_j}

But $β_j > 0$, so ρ_j < -λ

Case 3: $β_j = 0$

FOC ⟹ 0 ∈ [ρ_j-λ, ρ_j+λ]

So ρ_j ∈ [-λ, λ]


β_j = \underbrace{\begin{cases} -\frac{ρ_j + λ}{r_j} &&\text{ if } ρ_j<-λ \\ 0 &&\text{ if } -λ ≤ ρ_j ≤ λ\\ \frac{-ρ_j + λ}{r_j} &&\text{ else if } ρ_j > λ \end{cases}}_{≝ \underbrace{st}_{\rlap{\text{soft thresholding}}}\left(\frac{ρ_j}{r_j}, \; λ\right)}

4. Algorithm

One iteratively minimizes on each coordinate for each $j$, one sets $(β_k)_{k≠j}$ and minimizes $F(β)$ on the j-th coordinate.

Stopping criterion: $β - β_{new} < ε$

compute r_j for all j
while "not converged":
    for j=1, ..., d:
        compute ρ_j
        β_j = st(ρ_j/r_j, λ)
  • As \frac{\partial \text{RSS}}{\partial β_j} = - \sum\limits_{ i } y_i x_{ij} + \sum\limits_{ i } x_{ij} \Big(\sum\limits_{ k } β_k x_{ik}\Big) for $β^{OLS}$: β^{OLS}_j = -\frac{ρ_j}{r_j}
  • r_j ≝ \sum\limits_{ i } x_{ij}^2
  • ρ_j ≝ -\sum\limits_{ i=1 }^n x_{i,j} y_i + \sum\limits_{ i=1 }^n x_{i,j} \left(\sum\limits_{ k≠j } β_k x_{ik}\right)

Leave a comment