# Convex optimization

## Unconstrained

1. Toolbox ⟹ CVX

• CVX: package if you to get the solution to such a problem: $\begin{cases} \sup c^T x Ax = b x ≥ 0 \end{cases}$
2. Ellipsoid

• in 1D: dichotomy
• in higher dimension: $E_k ≝ \lbrace (x - x_k)^T P_k^{-1} (x - x_k) ≤ 1 \rbrace$ where $P_k$ is positive def.
• ex: if $P_k = σ^2 I_n$ ⟶ ball of radius $σ$
 Algorithm:
make sure that x* ∈ E_k
reduce the "size" of E_k
f(x) ≥ f(x_k) + f'(x_k)^T (x - x_k)
Find E_{k+1} as the minimum volume ellipsoid containing E_k ∩ {(x-x_k)^T f'(x_k) ≤ 0}


$d(x_k, x^\ast) = O(\exp(- \frac{k}{12 n^2}))$

NB: contrary to gradient descent, it cannot “get lucky”, in that the $O(⋯)$ is almost equality.

4. Newton

Algorithm:

• $x_{k+1} = x_k - γ f’(x_k)$
• Choice of $γ$:
• constant
• line-search
• exact: $\inf_{γ≥0} f(x_k - γ f’(x_k))$
• inexact

Proposition: Assume

• $f$ cvx and $C^2$
• all eigenvalues of $f’‘(x)$ are in $(μ, L)$ for all $x$ (where $μ$ is the smallest eigenvalue and $L$ the largest one)

NB:

• always $μ ≥ 0$
• $μ > 0 ⟺ f \text{ strongly convex}$

Algorithm:

• $x_{k+1} = x_k - γ f’(x_k)$
• if $γ = \frac 1 L$, then $\begin{cases} \Vert x_k - x_\ast \Vert^\ast ≤ (1 - \frac μ L)^k \Vert x_0 - x_\ast \Vert^2 \\ f(x_k) - f(x_\ast) ≤ \frac L k \Vert x_0 - x_\ast \Vert^2 \end{cases}$

Summary:

• If $f$ strongly convex, gradient descent is linearly/geometrically convergent (i.e. warning: linealry means times $\exp(- \ast k)$ (the number of digits grows linearly))

• $O(n)$ per iteration
• if $f$ is not convex, then there’s convergence to a stationary point only

## Newton’s method

Idea: optimize local quadratic Taylor expansion

• no parameter
• quadratically convergent: $c \Vert x_{k+1} - x_\ast \Vert ≤ (c \Vert x_k - x_\ast \Vert)^2$

Disavantages:

• Instable far from $x^\ast$
• $O(n^3)$ per iteration: each step is very expensive

Ridge regression estimator $\hat{w}$:
$\hat{w} ≝ argmin \frac{1}{2n} \Vert y - Xw \Vert^2_2 + \frac λ 2 \Vert w \Vert^2_2$

In TP, we showed that if $λ = 0$:

$\hat{w}_0 = (X^T X)^{-1} X^T y$

Solve first order condition ($\nabla = 0$)

$F$ is convex, so $w^\ast$ satisfies $\nabla F(w^\ast) = 0$.

\begin{align*} 0 = \nabla F (w) & = \frac{-1}{n} X^T (y - Xw) + λ w \\ ⟹ & \hat{w} = (λn I + X^T X)^{-1} X^T y\\ \end{align*}
$f$ is $λ$-strongly cvx:
$f - \frac λ 2 \Vert \bullet \Vert^2 \text{ is convex }$

1. $\Vert w_n - w_{n+1} \Vert ≤ ε$
2. $\Vert \nabla F\Vert ≤ ε$
3. $\Vert w_n - w^\ast \Vert$ (kind of cheating: if we already know the gradient)
$\nabla F (w) = \frac{-1}{n} X^T (y - Xw) + λ w$

$w_{n+1} = w_n - γ \nabla F (w_n)$
\begin{align*} γ^\ast_n & = argmin_γ F(ω_{n+1}) \\ & = argmin_γ F(w_n - γ \nabla F(w_n)) \\ & = argmin_γ \frac{1}{2n} \Vert y - X(w_n - γ \nabla F(w_n)) \Vert^2_2 + \frac λ 2 \Vert w_n - γ \nabla F(w_n) \Vert^2_2 \\ & = argmin_γ \underbrace{\frac{1}{2n} \Vert y - X(w_n + \frac{γ}{n} X^T (y - Xw_n) - γ λ w_n) \Vert^2_2 + \frac λ 2 \Vert w_n + \frac{γ}{n} X^T (y - Xw_n) - γ λ w_n \Vert^2_2}_{= g(γ)} \\ \end{align*}