# Maximum Likelihood and Linear discriminative analysis

## 1. Gaussian in one variable:

p(y) = \frac{1}{\sqrt{2π} σ} \exp\left(- \frac 1 2 \frac{(y - μ)^2}{σ^2}\right)

$(x_1, ⋯, x_n)$ iid sample from the model $\sim 𝒩(μ, σ^2)$

\begin{align*} - \log L(x_1, ⋯, x_n, μ, σ^2) &= - \sum\limits_{ i=1 }^n \log \frac{1}{\sqrt{2π σ^2}} \exp \left(- \frac 1 2 \frac{(x_i - μ)^2}{σ^2}\right) \\ &= \underbrace{n \, \log (\sqrt{2π}) + n \, \log σ + \frac 1 2 \sum\limits_{ i }\frac{(x_i - μ)^2}{σ^2}}_{ ≝ \, l(μ, σ)} \end{align*}

#### Minimize $l$ with respect to $μ$:

\begin{align*} &\frac{\partial l}{\partial μ}(\hat{μ}) = 0 \\ ⟹ & \sum\limits_{ i } (x_i - \hat{μ}) = 0 \\ ⟹ & \hat{μ} = \frac 1 n \sum\limits_{ i } x_i \end{align*}

#### Minimize $l$ wrt $λ ≝ \frac{1}{σ^2}$:

The function is convex wrt $λ$:

\begin{align*} &\frac{\partial l}{\partial λ}(\hat{λ}) = 0 \\ ⟹ & -\frac{n}{2λ} + \frac 1 2 \sum\limits_{ i } (x_i - μ)^2 = 0 \\ ⟹ & \hat{λ}^{-1} = \hat{σ}^2 = \frac 1 n \sum\limits_{ i } (x_i - μ)^2 \end{align*}

## BONUS: Regression:

Y = β^T x + ε

where

• $ε \sim 𝒩(0, σ^2)$
• $y \mid x \sim 𝒩(β^T x, σ^2)$

Same as before, but replace $μ$ by $β^Tx$:

l(β, σ^2) = -n \log (\sqrt{2π σ^2}) - \frac{1}{2σ^2} \sum\limits_{ i=1 }^n (y_i - β^T x_i)^2

#### Optimize over $β$:

\max - \frac{1}{2 σ^2} \sum\limits_{ i }(y_i - β^T x_i)^2 \\ ⟺ \min \sum\limits_{ i } (y_i - β^T x_i)^2

⟶ Least Squares problem

$\hat{β} = (X^T X)^{-1} X^T y$$#### Optimize over$σ^2$: \hat{σ}^2 = \frac 1 n \sum\limits_{ i=1 }^n (y_i - \hat{β}^T x_i)^2 (same computation as before) ## BONUS: Discrete model Two random variables$z$and$x$•$z ∈ \lbrace 1, ⋯, M \rbrace$•$x ∈ \lbrace 1, ⋯, K \rbrace$p(z = m) = π_m \\ p(x = k \mid z=m) = θ_{k,m} Samples$(z_i, x_i)_{1 ≤ i ≤ n}$Instead consider: (z_i, x_i) = (z_i^m, x_i^k)_{1≤ m ≤ M, 1 ≤ k ≤ K} where •$z_i = (0, ⋯, 1, 0, ⋯, 0) ∈ \lbrace 0, 1 \rbrace^M$(in position$m$) •$x_i = (0, ⋯, 1, 0, ⋯, 0) ∈ \lbrace 0, 1 \rbrace^K$(in position$k) Basically: • z_i^n = \begin{cases} 1 \text{ if } z_i = m \\ 0 \text{ else} \end{cases} • x_i^n = \begin{cases} 1 \text{ if } x_i = k \\ 0 \text{ else} \end{cases} Likelihood ?: \begin{align*} l(π, θ) &= \log\left(\prod_{i=1}^n p(x_i, z_i \mid π, θ)\right) \\ \end{align*} p(x_i, z_i \mid π, θ) = \prod_{k=1}^K \prod_{m=1}^M π_m^{z_i^m} θ_{k,m}^{z_i^m, x_i^k} where π_m^{z_i^m} = \begin{cases} π_m &&\text{ if } z_i^m = 1 ⟺ z_i = m \\ 1 &&\text{ else (i.e. } z_i^m = 0 \text{ )} \end{cases} and similarly forθ_{k,m}^{z_i^m, x_i^k}$since$p(x_i = k, z_i = m \mid π, θ) = π_k θ_{k, m}So: \begin{align*} l(π, θ) &= \log\left(\prod_{i=1}^n p(x_i, z_i \mid π, θ)\right) \\ &= \log\left(\prod_{i=1}^n \prod_{k=1}^K \prod_{m=1}^M π_m^{z_i^m} θ_{k,m}^{z_i^m, x_i^k}\right) \\ &= \sum\limits_{ i=1 }^n \sum\limits_{ m=1 }^M z_i^m \log π_m + \sum\limits_{ i=1 }^n \sum\limits_{ m=1 }^M \sum\limits_{ k=1 }^K z_i^m x_i^k \log θ_{k,m} \\ &= \sum\limits_{ m =1 }^M N_m \log π_m + \sum\limits_{ m =1 }^M \sum\limits_{ k =1 }^K N_{m, k} \log θ_{k,m} \end{align*} where •N_m$is the number of samples s.t.$z = m$•$N_{m, k}$is the number of samples s.t.$z = m, x= k$Therefore: l(π, θ) = \underbrace{\sum\limits_{ m =1 }^M N_m \log π_m}_{\text{ only depends on } π} + \underbrace{\sum\limits_{ m =1 }^M \sum\limits_{ k =1 }^K N_{m, k} \log θ_{k,m}}_{\text{ only depends on } θ}$\max l(π, θ)$with the constraints •$\sum\limits_{ m=1 }^M π_m = 1$•$\sum\limits_{ k=1 }^M θ_{m, k} = 1$for all$m$Two problems: 1. \max_π \sum\limits_{ m =1 }^M N_m \log π_m s.t.$\sum\limits_{ m=1 }^M π_m = 1$1. \max_θ \sum\limits_{ k =1 }^K \sum\limits_{ m =1 }^M N_{m, k} \log θ_m s.t.$\sum\limits_{ m=1 }^M θ_{m, k} = 1$for all$1 ≤ m ≤ M$#### 1. Lagrangian: ℒ(π, λ) = \sum\limits_{ m =1 }^M N_m \log π_m + λ \left(\sum\limits_{ m=1 }^M π_m - 1\right) First order conditions: •$\frac{\partial ℒ}{\partial π_m} = 0$•$\frac{\partial ℒ}{\partial λ} = 0$•$\frac{N_m}{π_m} + λ = 0$•$\sum\limits_{ m=1 }^M π_m = 1$•$π_m = \frac{N_m}{λ}$•$1 = \sum\limits_{ m } π_m = \sum\limits_{ m } \frac{N_m}{λ} ⟹ λ = n\$

$π_m = \frac{N_m}{n}$

Tags:

Updated: