# Convex optimization

## Unconstrained

1. Toolbox ⟹ CVX

• CVX: package if you to get the solution to such a problem: $\begin{cases} \sup c^T x Ax = b x ≥ 0 \end{cases}$
2. Ellipsoid

• in 1D: dichotomy
• in higher dimension: $E_k ≝ \lbrace (x - x_k)^T P_k^{-1} (x - x_k) ≤ 1 \rbrace$ where $P_k$ is positive def.
• ex: if $P_k = σ^2 I_n$ ⟶ ball of radius $σ$
 Algorithm:
make sure that x* ∈ E_k
reduce the "size" of E_k
f(x) ≥ f(x_k) + f'(x_k)^T (x - x_k)
Find E_{k+1} as the minimum volume ellipsoid containing E_k ∩ {(x-x_k)^T f'(x_k) ≤ 0}


$d(x_k, x^\ast) = O(\exp(- \frac{k}{12 n^2}))$

NB: contrary to gradient descent, it cannot “get lucky”, in that the $O(⋯)$ is almost equality.

4. Newton

Algorithm:

• $x_{k+1} = x_k - γ f’(x_k)$
• Choice of $γ$:
• constant
• line-search
• exact: $\inf_{γ≥0} f(x_k - γ f’(x_k))$
• inexact

Proposition: Assume

• $f$ cvx and $C^2$
• all eigenvalues of $f’‘(x)$ are in $(μ, L)$ for all $x$ (where $μ$ is the smallest eigenvalue and $L$ the largest one)

NB:

• always $μ ≥ 0$
• $μ > 0 ⟺ f \text{ strongly convex}$

Algorithm:

• $x_{k+1} = x_k - γ f’(x_k)$
• if $γ = \frac 1 L$, then $\begin{cases} \Vert x_k - x_\ast \Vert^\ast ≤ (1 - \frac μ L)^k \Vert x_0 - x_\ast \Vert^2 \\ f(x_k) - f(x_\ast) ≤ \frac L k \Vert x_0 - x_\ast \Vert^2 \end{cases}$

Summary:

• If $f$ strongly convex, gradient descent is linearly/geometrically convergent (i.e. warning: linealry means times $\exp(- \ast k)$ (the number of digits grows linearly))

• $O(n)$ per iteration
• if $f$ is not convex, then there’s convergence to a stationary point only

## Newton’s method

Idea: optimize local quadratic Taylor expansion

• no parameter
• quadratically convergent: $c \Vert x_{k+1} - x_\ast \Vert ≤ (c \Vert x_k - x_\ast \Vert)^2$

Disavantages:

• Instable far from $x^\ast$
• $O(n^3)$ per iteration: each step is very expensive

Ridge regression estimator $\hat{w}$:
\hat{w} ≝ argmin \frac{1}{2n} \Vert y - Xw \Vert^2_2 + \frac λ 2 \Vert w \Vert^2_2

In TP, we showed that if $λ = 0$:

\hat{w}_0 = (X^T X)^{-1} X^T y

Solve first order condition ($\nabla = 0$)

$F$ is convex, so $w^\ast$ satisfies $\nabla F(w^\ast) = 0$.

\begin{align*} 0 = \nabla F (w) & = \frac{-1}{n} X^T (y - Xw) + λ w \\ ⟹ & \hat{w} = (λn I + X^T X)^{-1} X^T y\\ \end{align*}
$f$ is $λ$-strongly cvx:
f - \frac λ 2 \Vert \bullet \Vert^2 \text{ is convex }

Halting conditions for gradient descent:

1. $\Vert w_n - w_{n+1} \Vert ≤ ε$
2. $\Vert \nabla F\Vert ≤ ε$
3. $\Vert w_n - w^\ast \Vert$ (kind of cheating: if we already know the gradient)
\nabla F (w) = \frac{-1}{n} X^T (y - Xw) + λ w