I am not an engineer
Must read:Is there a real future in data analysis for self-learners without a math degree?

Generalized Linear Models (Intro)

If you see an error in the article, please comment or drop me an email.

Generalized linear models include linear models, but they go beyond to handle many of the issues with linear models.

Limitations of Linear Models

  • Additive response models don’t make much sense if the response is discrete (for instance binary data)

  • Additive error models often don’t make sense if the outcome has to be positive.

  • Transformations are often hard to interpret and affect scale.

  • Particularly interpretable transformations (natural logarithms for example) aren’t applicable for negative or zero values.

Three most famous cases of Generalized linear models

  • Linear models

  • binomial and binary regression

  • Poisson regression

Three components of Generalized linear models

  • **Distribution*: an exponential family model for the response. This is the random component.

  • Linear predictor: this is the systematic component. Think of regression variables for instance.

  • Link function: which connects the means of the respone to the linear predictor

Three components as applied to a linear model example

  • Distribution: assume a gaussian distribution \(Y_i \sim N(\mu_i, \sigma^2)\)

  • Linear predictor: the sum of the covariates X times the coefficients beta: \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\)

  • Link function: the mean is exactly the sum of covariates \(g(\mu) = \mu\) so that \(\mu_i = \eta_i\)

Three components as applied to logistic regression

  • Distribution: assume a Bernoulli distribution as we are dealing with binary data (0/1 coin-flip) with a probability of a head of \(mu_i\): \(E[Y_i] = \mu_i\)

  • Linear predictor: sum of covariates X times beta: \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\)

  • Link function: the log of the odds – \(g(\mu) = \eta = \log\left( \frac{\mu}{1 – \mu}\right)\). The log rythm transforms the mean of the distribution, but not the Y themselves. We transform the probability of getting a head in a way as to make the probability relate to our covariates and coefficients. You can always go back to the original \(mu\).

Three components as applied to Poisson regression

  • Distribution: assume a Poisson distribution, which is useful for unbounded count data. Remember that binomial counts are bounded, unlike Poisson counts. E.g. the number of people stopping at a shop.

  • Linear predictor: once again, the sum of the covariates X times the coefficients \(\eta_i = \sum_{k=1}^p X_{ik} \beta_k\)

  • Link function: for Poisson, the common link function is the log link. \(g(\mu) = \eta = \log(\mu)\)

Variances in generalized linear models

  • For the linear model is constant: \(Var(Y_i) = \sigma^2\)

  • In the Bernoulli case, the variance depends on the likelihood: \(Var(Y_i) = \mu_i (1 – \mu_i)\)

  • In the Poisson case, the variance is the mean of i. Thus, the variance differs by i: \(Var(Y_i) = \mu_i\).

The modelling assumptions for generalized linear models often put restrictions on the relationship between the mean and the variance. In case the data does not adhere to the GLM structure, you can use a more flexible variance model: the quasi-likelihood normal equations.

0=\sum_{i=1}^n \frac{(Y_i – \mu_i)}{\phi \mu_i (1 – \mu_i ) } W_i ~~~\mbox{and}~~~
0=\sum_{i=1}^n \frac{(Y_i – \mu_i)}{\phi \mu_i} W_i


The correct interpretation of GLMs is not far from the one in linear models:

change in the link function of the expected response per unit change in X holding other regressors constant.

The results are based on asymptotics. Therefore, one have to make available larger sample sizes when dealing with GLMs.