Generalized Linear Models (Intro)
If you see an error in the article, please comment or drop me an email.
Generalized linear models include linear models, but they go beyond to handle many of the issues with linear models.
Limitations of Linear Models

Additive response models don’t make much sense if the response is discrete (for instance binary data)

Additive error models often don’t make sense if the outcome has to be positive.

Transformations are often hard to interpret and affect scale.

Particularly interpretable transformations (natural logarithms for example) aren’t applicable for negative or zero values.
Three most famous cases of Generalized linear models

Linear models

binomial and binary regression

Poisson regression
Three components of Generalized linear models

**Distribution*: an exponential family model for the response. This is the random component.

Linear predictor: this is the systematic component. Think of regression variables for instance.

Link function: which connects the means of the respone to the linear predictor
Three components as applied to a linear model example

Distribution: assume a gaussian distribution \(Y_i \sim N(\mu_i, \sigma^2)\)

Linear predictor: the sum of the covariates
X
times the coefficientsbeta
: \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\) 
Link function: the mean is exactly the sum of covariates \(g(\mu) = \mu\) so that \(\mu_i = \eta_i\)
Three components as applied to logistic regression

Distribution: assume a Bernoulli distribution as we are dealing with binary data (0/1 coinflip) with a probability of a head of \(mu_i\): \(E[Y_i] = \mu_i\)

Linear predictor: sum of covariates
X
timesbeta
: \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\) 
Link function: the log of the odds – \(g(\mu) = \eta = \log\left( \frac{\mu}{1 – \mu}\right)\). The log rythm transforms the mean of the distribution, but not the
Y
themselves. We transform the probability of getting a head in a way as to make the probability relate to our covariates and coefficients. You can always go back to the original \(mu\).
Three components as applied to Poisson regression

Distribution: assume a Poisson distribution, which is useful for unbounded count data. Remember that binomial counts are bounded, unlike Poisson counts. E.g. the number of people stopping at a shop.

Linear predictor: once again, the sum of the covariates
X
times the coefficients \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\) 
Link function: for Poisson, the common link function is the log link. \(g(\mu) = \eta = \log(\mu)\)
Variances in generalized linear models

For the linear model is constant: \(Var(Y_i) = \sigma^2\)

In the Bernoulli case, the variance depends on the likelihood: \(Var(Y_i) = \mu_i (1 – \mu_i)\)

In the Poisson case, the variance is the mean of i. Thus, the variance differs by
i
: \(Var(Y_i) = \mu_i\).
The modelling assumptions for generalized linear models often put restrictions on the relationship between the mean and the variance. In case the data does not adhere to the GLM structure, you can use a more flexible variance model: the quasilikelihood normal equations.
\[
0=\sum_{i=1}^n \frac{(Y_i – \mu_i)}{\phi \mu_i (1 – \mu_i ) } W_i ~~~\mbox{and}~~~
0=\sum_{i=1}^n \frac{(Y_i – \mu_i)}{\phi \mu_i} W_i
\]
Interpretation
The correct interpretation of GLMs is not far from the one in linear models:
change in the link function of the expected response per unit change in X holding other regressors constant.
The results are based on asymptotics. Therefore, one have to make available larger sample sizes when dealing with GLMs.