Intro – Motivation.
High-dimensional data
Useful for high-dimensional data, where the dimension is similar or much larger than the number of data samples, . In this regime the maximum likelihood estimate of the parameters for a fitted Gaussian have problems, and similar problems would occur for Gaussian mixture model (Particular example of this).
To solve this, we could constraint the covariance matrix of the Gaussian to be diagonal. You could also contrain it to be proportional to identity matrix.
The factor analysis model is another way to do this that doesn't throw away correlations
Assume a latent variable , , .
Then the data has conditional distribution . Equivalently, , where . We also assume that is diagonal.
Basically, model the data as lying in some subspace, which is possibly lower-dimensional than that of , and having some noise around this subspace.
Some notation and some probability results for Gaussians (recap)
Distribution of the random variable (z,x). Result:
This implies that if we marginalize , we find
We could use MLE (likelihood), but it turns out that the resulting optimization problem can't be solved in closed form, and it's quite hard. Therefore, we actually use the EM algorithm. Note that is now continuous, so sum over its values become integrals.
E-step. Video, using the properties of Gaussians he mentioned above
M-step. Video (using special trick for Gaussian integral). Go here really.