Lab: K-means and expectation maximization algorithms

Lecturer: Lyudmila Kushnir

Problem 1. K-means clustering

Link of the associated Jupyter notebook

\[J = \sum\limits_{ n=1 }^N \sum\limits_{ k=1 }^K r_{nk} \vert x_n - μ_k \vert^2\]
  • E-step:

    assign each data point $x_n$ to the closest cluster

  • M-step:

    \[\frac{∂J}{∂μ_k} = \frac{∂}{∂μ_n}\left( \sum\limits_{ n=1 }^N r_{nk} \vert x_n - μ_k \vert^2\right) = -2 \sum\limits_{ n=1 }^N r_{nk} (x_n - μ_k)\]

    So:

    \[\begin{align*} & \frac{∂J}{∂μ_k} = 0 \\ ⟺ \quad & -2 \sum\limits_{ n=1 }^N r_{nk} (x_n - μ_k) = 0 \\ ⟺ \quad & μ_k = \frac{\sum\limits_{ n=1 }^N r_{nk} x_n}{\sum\limits_{ n=1 }^N r_{nk}}\\ \end{align*}\]

2. Expectation Maximization for Gaussian mixtures

For instance:

\[P(X) = \sum\limits_{ k=1 }^K π_k \frac{1}{\sqrt{2π σ_k^2}} \; \exp\left(-\frac{X-μ_k}{2σ_k^2}\right)\]

Latent variable:

\[\textbf{Z} = \begin{pmatrix} 0 \\ \vdots \\ 1\\ \vdots\\ 0 \end{pmatrix}\]

and $\textbf{Z}_k = 1$ with probability $π_k$

E-step:

\[γ_k(x_n) = P(z=k \mid x_n) = \frac{P(z=k) P(x_n \mid k)}{P(x_n)} = \frac{π_k 𝒩(x_n \mid μ_k, \textbf{Σ}_k)}{\sum\limits_{ j } π_j 𝒩(x_n \mid μ_j, \textbf{Σ}_j) }\]

and then M-step:

\[μ_k = \frac{\sum\limits_{ n=1 }^N γ_k(x_n) x_n }{\sum\limits_{ n=1 }^N γ_k(x_n)}\\ \textbf{Σ}_k^{new} = \frac {\sum\limits_{ n=1 }^N γ_k(x_n) (x_n - μ_k) (x_n - μ_k)^T}{\sum\limits_{ n=1 }^N γ_k(x_n)}\\ π_k = \frac{\sum\limits_{ n=1 }^N γ_k(x_n)}{N}\]
\[P(X \mid π, μ, \textbf{Σ}) = \prod\limits_{ n=1 }^N P(x_n \mid π, μ, \textbf{Σ})\\ = \prod\limits_{ n=1 }^N \Bigg(\sum\limits_{ k=1 }^K \underbrace{P(z_k)}_{≝ \; π_k} \; \underbrace{P(x_n \mid z_k, π, μ, \textbf{Σ})}_{≝ \; 𝒩(x_n \mid μ_k, \textbf{Σ}_k)}\Bigg)\]

Leave a comment