aka PCA
visual intro – Intro vid by Andrew Ng
Given {x(1),...,x(n)} where x(i)∈Rn
Reduce it to k-dimensional data set (k<n, often k≪n), so that the dimensions we retain are able to recover the data as well as possible.
Examples, oxford notes
Algorithm
Summary of algorithm
Pre-processing of the data
vid
- Zero-out mean
- Set μ=m1∑i=1mx(i)
- Replace x(i) with x(i)−μ
- Normalize to unit variance
- Set σj2=m1∑i=1m(x(i))2
- Replace xj(i) with σjxj(i)
Finding principal components
Example and intuition: we want to find the direction so that when we project the data to the line pointing in that direction, the variance of the data is as high as possible.
Note: If ∣∣u∣∣=1, vector x(i) projected onto u has length (x(i))Tu. This also minimizes the variance perpendicular to that line.
Choose u s.t.:
u:∣∣u∣∣=1maxm1i=1∑m((x(i))Tu)2
=u:∣∣u∣∣=1maxm1i=1∑m(uTx(i))((x(i))Tu)
=u:∣∣u∣∣=1maxuT[m1i=1∑mx(i)(x(i))T]u
This implies that u is the principal eigenvector of the covariance matrix:
Σ=m1i=1∑mx(i)(x(i))T
See here for nice derivation.
More generally for k-dimensional subspace on which to project the data, you choose the k Eigenvectors with the largest Eigenvalues.
Can then also transform to the new subspace by projecting into the new basis to get a lower-dimensional representation of the data.
Another view of PCA, there are several more views of PCA.
Implementation of PCA
Problem with covariant matrix
Using the Design matrix, X, we can rewrite the covariance matrix as Σ=XTX
We can use Singular value decomposition, X=UDVT and then, the top k columns of V are the top k eigenvectors of XTX=Σ. If the number of samples is much smaller than their dimensionality, then X is a fat matrix (m×d) with much fewer entries than Σ (d×d), and is thus more efficient to store on memory, and to compute with.
Applications of PCA
Video
LSI is essentially application of PCA to text data
Independent component analysis
PPCA (probabilistic PCA)
It is proposed that PCA of autocorrelation matrices of place cell activations produce Grid cells in the Spatial representation in the brain