Generalized linear model: Cosmos — All that is, or was, or ever will be

Generalized linear model

cosmos 5th January 2017 at 3:41pm

See slides here

A generalization of Linear regression, where $p(y|x;\theta)=f (y;\theta^T x)$ . This means that, for the case of regression, the model output has the form $h_\theta (x)=E [y|x;\theta]=g (\theta^T x)$ , i.e. the model is a function of the linear model; the errors also aren't necessarily Gaussian. Here $g$ is called the canonical response function. It's inverse is called the canonical link function.

In GLM, a common family of probability distributions is the Exponential family distributions

https://www.wikiwand.com/en/Generalized_linear_model

Assumptions:

$y|x;\theta \sim \text{ExpFamily}(\eta)$ , for some chosen functions $a$ and $b$
Given $x$ , goal is to output $E[T(y)|x]$ . Want to learn $f(x) = E[T(y)|x]$ . Therefore we're really learning a function (and thus this is an example of Regression). However, $y$ itself can be categorical (this function representing its probability), so GLM can be applied to Classification.
_{(assumption/design choice)} $\eta = \theta^T x$ . Then, this determines the response/link function $g(\eta) = a'(\eta)=E[y|\eta]$ . This choice simplifies the MLE calculations, because $\eta$ appears linearly with $y$

Note that $\eta$ can be a vector (as in the multinomial example below), in which case $\theta$ is a matrix.

Example:

Min 43 of lect4 (Ng)

Classification with Bernoulli distribution (parametrized by $\eta$ ) (Logistic regression)

Regression with Gaussian distribution (the mean being $\eta$ ) (Linear regression).

Classification over $k$ categories with multinomial distribution (min 52, lec4 (Ng))

Maximum likelihood estimate

Can learn parameter using Maximum likelihood estimator

MLE doesn't depend on dispersion parameter, but the uncertainty does.

Quasi-likelihood models

Multiply variance by dispersion parameter, which makes it not a probabilistic model, but can fit observed larger variances. Hmmm..