See part III in here, video. video2
See slides here
A generalization of Linear regression, where . This means that, for the case of regression, the model output has the form , i.e. the model is a function of the linear model; the errors also aren't necessarily Gaussian. Here is called the canonical response function. It's inverse is called the canonical link function.
In GLM, a common family of probability distributions is the Exponential family distributions
https://www.wikiwand.com/en/Generalized_linear_model
Assumptions:
Note that can be a vector (as in the multinomial example below), in which case is a matrix.
Example:
Min 43 of lect4 (Ng)
Classification with Bernoulli distribution (parametrized by ) (Logistic regression)
Regression with Gaussian distribution (the mean being ) (Linear regression).
Classification over categories with multinomial distribution (min 52, lec4 (Ng))
Can learn parameter using Maximum likelihood estimator
MLE doesn't depend on dispersion parameter, but the uncertainty does.
Multiply variance by dispersion parameter, which makes it not a probabilistic model, but can fit observed larger variances. Hmmm..