Factor analysis model

cosmos 4th November 2016 at 2:43pm
Unsupervised learning

IntroMotivation.

High-dimensional data

Useful for high-dimensional data, where the dimension is similar or much larger than the number of data samples, nmn \gg m. In this regime the maximum likelihood estimate of the parameters for a fitted Gaussian have problems, and similar problems would occur for Gaussian mixture model (Particular example of this).

To solve this, we could constraint the covariance matrix of the Gaussian to be diagonal. You could also contrain it to be proportional to identity matrix.

The factor analysis model is another way to do this that doesn't throw away correlations

Model

Description of model.

Assume a latent variable zN(0,I)z \sim \mathcal{N}(0, \mathbf{I}), zRdz \in \mathbb{R}^d, d<nd < n.

Then the data has conditional distribution xzN(μ+Λz,Ψ)x | z \sim \mathcal{N}(\mu + \mathbf{\Lambda} z, \mathbf{\Psi}). Equivalently, x=μ+Λz+ϵx = \mu + \mathbf{\Lambda} z + \epsilon, where ϵN(0,Ψ)\epsilon \sim \mathcal{N}(0, \mathbf{\Psi}). We also assume that Ψ\mathbf{\Psi} is diagonal.

Basically, model the data as lying in some subspace, which is possibly lower-dimensional than that of xx, and having some noise around this subspace.

Another example

Some notation and some probability results for Gaussians (recap)

Distribution of the random variable (z,x). Result:

[zx]N([0μ],[IΛTΛΛΛT+Ψ])\begin{bmatrix}\vec{z} \\ \vec{x}\end{bmatrix} \sim \mathcal{N}\left(\begin{bmatrix}\vec{0} \\ \vec{\mu}\end{bmatrix} , \begin{bmatrix} \mathbf{I} & \mathbf{\Lambda}^T \\ \mathbf{\Lambda} & \mathbf{\Lambda} \mathbf{\Lambda}^T + \mathbf{\Psi}\end{bmatrix} \right)

This implies that if we marginalize z\vec{z}, we find

xN(μ,ΛΛT+Ψ)\vec{x} \sim \mathcal{N}(\vec{\mu}, \mathbf{\Lambda} \mathbf{\Lambda}^T + \mathbf{\Psi} )

Learning the parameters

We could use MLE (likelihood), but it turns out that the resulting optimization problem can't be solved in closed form, and it's quite hard. Therefore, we actually use the EM algorithm. Note that z(i)z^{(i)} is now continuous, so sum over its values become integrals.

E-step. Video, using the properties of Gaussians he mentioned above

M-step. Video (using special trick for Gaussian integral). Go here really.

Result for \Lambda