- Usually considered an unsupervised learning method
- Used for learning the low-dimensional structures in the data (e.g., topic vectors instead of bag-of-words vectors, etc.)
- Fewer dimensions
$\Rightarrow$ Less chances of overfitting$\Rightarrow$ Better generalization.
- Projection matrix
$U = [u_1, u_2, \cdots, u_K]$ of size$D\times K$ defines$K$ linear projection direction. -
$U$ is to project$x^{(i)}\in \mathbb R^D$ to$z^{(i)}\in\mathbb{R}^K$
- Usage: s dimensionality reduction, lossy data compression, feature extraction, and data visualization
Def. (2 commonly used definitions)
- Learning projection directions that capture maximum variance in data
- Learning projection directions that result in smallest reconstruction error
- Projection of
$x^{(i)}$ along a one-dim subspace defined by$u_1\in\mathbb{R}^D$ , where$\vert\vert u_1\vert\vert=1$ . -
Mean of projections is
$u_1^T\mu$ , where$\mu=\frac1N\sum_{i=1}^Nx^{(i)}$ is the mean of all data. -
Variance of projections is
$u_1^TSu_1$
-
$S$ is the$D \times D$ data covariance matrix
- We want
$u_{1}$ s.t. the variance of the projected data is maximized
- The method of Lagrange multipliers
- where
$\lambda_{1}$ is a Lagrange multiplier - Take the derivative w.r.t.
$u_1$ and setting to zero
- Thus,
$u_1$ is an eigenvector of$S$ - The variance of projection is
$u_1^TSu_1=\lambda_1$ . - Variance is maximized when
$u_1$ is the top eigenvector with largest eigenvalue (so-called the first Principle Component, PC).
- Center the data (subtract
$\mu$ for each data) - Compute the covariance matrix
$S=\frac1NXX^T$ - Perform eigen decomposition of
$S$ and take first$K$ leading eigenvectors${u_i}_{i=1,\cdots,K}$ . - The projection is therefore given by
$Z=U^TX$