high-dimensional data$\qquad \rightsquigarrow \qquad \underbrace{\textit{lower-dimensional data}}_{\text{easier to visualize}}$
Manifold hypothesis: real-world high-dimensional data vectors lie in a lower-dimensional embedded manifold.
Goal: Preserve distances between points ⟹ conserve geometry
Gradient descent to minimize
The smaller the original distance between points is, the more preserved it is
Cost function to minimize:
Build the graph where each point is connected to its $k$ nearest neighbors in original space, then:
Graph Physical analogy vertices repelling charged particles edges springs
As with electric potential energy and spring energy, minimize the potential energy function:
$\displaystyle \sum\limits_{x_i ≠ x_j \text{ data points}} \frac{1}{d^\ast(x_i, x_j)}$ $+$ $\displaystyle \sum\limits_{x_i, x_j \text{ data points}} \frac 1 2 \Big(d^\ast(x_i, x_j) - d(x_i, x_j)\Big)^2$
Goal: Find orthogonal axes onto which the variance of the data points under projection is maximal, i.e. find the best possible “angles” from which the data points are the most spread out.
Goal: MDS but curvature of the data space taken into account
Goal: Preserve the relationship between neighbors.
Find the weight matrix $(W_{i,j})_{i,j}$ - whose rows sum to $1$ - that minimizes
Map each data point $x_i$ to a point $y_i$ in the vizualization, s.t. the $y_k$'s minimize
Map points are:
Image courtesy of statquest.org
probability that $x_i$ has $x_j$ as its neighbor if neighbors were chosen according to a Gaussian distribution centered at $x_i$
$y_i$'s initialized at random
Similarities between visualization points:
⟶ computed with resort to a Student-$t$ distribution
Minimize the Kullback–Leibler divergence: $C ≝ \sum_{i≠ j}p_{ij}\log {\frac {p_{i,j}}{q_{i,j}}}$, by modifying the $y_i$'s with gradient descent
Recompute the $q_{i,j}$'s at each step (until convergence is reached)
In a neural network:
Since representations are high-dimensional ⟹ DR methods to visualize them
Build matrices of pairwise distances:
Step up the ladder of abstraction: visualize vectorized representations with t-SNE
Regarding neural networks: meta-SNE enables us no longer to confine ourselves to comparing their outcome only, but also how they operate internally.