Lab: K-means and expectation maximization algorithms

Lecturer: Lyudmila Kushnir

Problem 1. K-means clustering

Link of the associated Jupyter notebook

J = \sum\limits_{ n=1 }^N \sum\limits_{ k=1 }^K r_{nk} \vert x_n - μ_k \vert^2
  • E-step:

    assign each data point $x_n$ to the closest cluster

  • M-step:

    \frac{∂J}{∂μ_k} = \frac{∂}{∂μ_n}\left( \sum\limits_{ n=1 }^N r_{nk} \vert x_n - μ_k \vert^2\right) = -2 \sum\limits_{ n=1 }^N r_{nk} (x_n - μ_k)

    So:

    \begin{align*} & \frac{∂J}{∂μ_k} = 0 \\ ⟺ \quad & -2 \sum\limits_{ n=1 }^N r_{nk} (x_n - μ_k) = 0 \\ ⟺ \quad & μ_k = \frac{\sum\limits_{ n=1 }^N r_{nk} x_n}{\sum\limits_{ n=1 }^N r_{nk}}\\ \end{align*}

2. Expectation Maximization for Gaussian mixtures

For instance:

P(X) = \sum\limits_{ k=1 }^K π_k \frac{1}{\sqrt{2π σ_k^2}} \; \exp\left(-\frac{X-μ_k}{2σ_k^2}\right)

Latent variable:

\textbf{Z} = \begin{pmatrix} 0 \\ \vdots \\ 1\\ \vdots\\ 0 \end{pmatrix}

and $\textbf{Z}_k = 1$ with probability $π_k$

E-step:

γ_k(x_n) = P(z=k \mid x_n) = \frac{P(z=k) P(x_n \mid k)}{P(x_n)} = \frac{π_k 𝒩(x_n \mid μ_k, \textbf{Σ}_k)}{\sum\limits_{ j } π_j 𝒩(x_n \mid μ_j, \textbf{Σ}_j) }

and then M-step:

μ_k = \frac{\sum\limits_{ n=1 }^N γ_k(x_n) x_n }{\sum\limits_{ n=1 }^N γ_k(x_n)}\\ \textbf{Σ}_k^{new} = \frac {\sum\limits_{ n=1 }^N γ_k(x_n) (x_n - μ_k) (x_n - μ_k)^T}{\sum\limits_{ n=1 }^N γ_k(x_n)}\\ π_k = \frac{\sum\limits_{ n=1 }^N γ_k(x_n)}{N}

P(X \mid π, μ, \textbf{Σ}) = \prod\limits_{ n=1 }^N P(x_n \mid π, μ, \textbf{Σ})\\ = \prod\limits_{ n=1 }^N \Bigg(\sum\limits_{ k=1 }^K \underbrace{P(z_k)}_{≝ \; π_k} \; \underbrace{P(x_n \mid z_k, π, μ, \textbf{Σ})}_{≝ \; 𝒩(x_n \mid μ_k, \textbf{Σ}_k)}\Bigg)

Leave a comment