Generalized linear model

cosmos 5th January 2017 at 3:41pm
Supervised learning

See part III in here, video. video2

See slides here

A generalization of Linear regression, where p(yx;θ)=f(y;θTx)p(y|x;\theta)=f (y;\theta^T x). This means that, for the case of regression, the model output has the form hθ(x)=E[yx;θ]=g(θTx)h_\theta (x)=E [y|x;\theta]=g (\theta^T x), i.e. the model is a function of the linear model; the errors also aren't necessarily Gaussian. Here gg is called the canonical response function. It's inverse is called the canonical link function.

In GLM, a common family of probability distributions is the Exponential family distributions

https://www.wikiwand.com/en/Generalized_linear_model

Assumptions:

  • yx;θExpFamily(η)y|x;\theta \sim \text{ExpFamily}(\eta), for some chosen functions aa and bb
  • Given xx, goal is to output E[T(y)x]E[T(y)|x]. Want to learn f(x)=E[T(y)x]f(x) = E[T(y)|x]. Therefore we're really learning a function (and thus this is an example of Regression). However, yy itself can be categorical (this function representing its probability), so GLM can be applied to Classification.
  • (assumption/design choice) η=θTx\eta = \theta^T x. Then, this determines the response/link function g(η)=a(η)=E[yη]g(\eta) = a'(\eta)=E[y|\eta]. This choice simplifies the MLE calculations, because η\eta appears linearly with yy

Note that η\eta can be a vector (as in the multinomial example below), in which case θ\theta is a matrix.

Example:

Min 43 of lect4 (Ng)

Classification with Bernoulli distribution (parametrized by η\eta ) (Logistic regression)

Regression with Gaussian distribution (the mean being η\eta ) (Linear regression).

Classification over kk categories with multinomial distribution (min 52, lec4 (Ng))

Maximum likelihood estimate

Can learn parameter using Maximum likelihood estimator

MLE doesn't depend on dispersion parameter, but the uncertainty does.

Quasi-likelihood models

Multiply variance by dispersion parameter, which makes it not a probabilistic model, but can fit observed larger variances. Hmmm..