Principal component analysis

cosmos 25th August 2017 at 12:02pm
Dimensionality reduction

aka PCA

visual introIntro vid by Andrew Ng

Given {x(1),...,x(n)}\{ x^{(1)}, ..., x^{(n)}\} where x(i)Rn x^{(i)} \in \mathbb{R}^n

Reduce it to kk-dimensional data set (k<nk < n, often knk \ll n), so that the dimensions we retain are able to recover the data as well as possible.

Examples, oxford notes

Algorithm

Summary of algorithm

Pre-processing of the data

vid

  1. Zero-out mean
    1. Set μ=1mi=1mx(i)\mu =\frac{1}{m}\sum _{i=1}^mx^{\left(i\right)}
    2. Replace x(i)x^{(i)} with x(i)μx^{(i)} -\mu
  2. Normalize to unit variance
    1. Set σj2=1mi=1m(x(i))2\sigma _j^2=\frac{1}{m}\sum _{i=1}^m\left(x^{\left(i\right)}\right)^2
    2. Replace xj(i)x^{(i)}_j with xj(i)σj\frac{x^{(i)}_j}{\sigma_j}

Finding principal components

Example and intuition: we want to find the direction so that when we project the data to the line pointing in that direction, the variance of the data is as high as possible. Note: If u=1||u||=1, vector x(i)x^{(i)} projected onto uu has length (x(i))Tu(x^{(i)})^T u. This also minimizes the variance perpendicular to that line.

Choose uu s.t.:

maxu:u=11mi=1m((x(i))Tu)2\max\limits_{u:||u||=1} \frac{1}{m}\sum\limits_{i=1}^m ((x^{(i)})^Tu)^2

=maxu:u=11mi=1m(uTx(i))((x(i))Tu)=\max\limits_{u:||u||=1} \frac{1}{m}\sum\limits_{i=1}^m (u^Tx^{(i)})((x^{(i)})^Tu)

=maxu:u=1uT[1mi=1mx(i)(x(i))T]u=\max\limits_{u:||u||=1} u^T \left [\frac{1}{m}\sum\limits_{i=1}^m x^{(i)}(x^{(i)})^T \right] u

This implies that uu is the principal eigenvector of the covariance matrix:

Σ=1mi=1mx(i)(x(i))T\mathbf{\Sigma} = \frac{1}{m}\sum\limits_{i=1}^m x^{(i)}(x^{(i)})^T

See here for nice derivation.

More generally for k-dimensional subspace on which to project the data, you choose the kk Eigenvectors with the largest Eigenvalues.

Can then also transform to the new subspace by projecting into the new basis to get a lower-dimensional representation of the data.

Another view of PCA, there are several more views of PCA.

Implementation of PCA

Problem with covariant matrix

Using the Design matrix, XX, we can rewrite the covariance matrix as Σ=XTX\Sigma = X^T X

We can use Singular value decomposition, X=UDVTX= U D V^T and then, the top kk columns of VV are the top kk eigenvectors of XTX=ΣX^T X = \Sigma. If the number of samples is much smaller than their dimensionality, then XX is a fat matrix (m×dm \times d) with much fewer entries than Σ\Sigma (d×dd \times d), and is thus more efficient to store on memory, and to compute with.

Applications of PCA

Video

Latent semantic indexing

LSI is essentially application of PCA to text data


Independent component analysis

PPCA (probabilistic PCA)

It is proposed that PCA of autocorrelation matrices of place cell activations produce Grid cells in the Spatial representation in the brain