Introduction to Statistical Modeling with SAS/STAT Software

Generalized Linear Models

A class of models that has gained increasing importance in the past several decades is the class of generalized linear models. The theory of generalized linear models originated with Nelder and Wedderburn (1972); Wedderburn (1974), and was subsequently made popular in the monograph by McCullagh and Nelder (1989). This class of models extends the theory and methods of linear models to data with nonnormal responses. Before this theory was developed, modeling of nonnormal data typically relied on transformations of the data, and the transformations were chosen to improve symmetry, homogeneity of variance, or normality. Such transformations have to be performed with care because they also have implications for the error structure of the model, Also, back-transforming estimates or predicted values can introduce bias.

Generalized linear models also apply a transformation, known as the link function, but it is applied to a deterministic component, the mean of the data. Furthermore, generalized linear models take the distribution of the data into account, rather than assuming that a transformation of the data leads to normally distributed data to which standard linear modeling techniques can be applied.

To put this generalization in place requires a slightly more sophisticated model setup than that required for linear models for normal data:

  • The systematic component is a linear predictor similar to that in linear models, eta equals bold x prime bold-italic beta. The linear predictor is a linear function in the parameters. In contrast to the linear model, eta does not represent the mean function of the data.

  • The link function g left-parenthesis dot right-parenthesis relates the linear predictor to the mean, g left-parenthesis mu right-parenthesis equals eta. The link function is a monotonic, invertible function. The mean can thus be expressed as the inversely linked linear predictor, mu equals g Superscript negative 1 Baseline left-parenthesis eta right-parenthesis. For example, a common link function for binary and binomial data is the logit link, g left-parenthesis t right-parenthesis equals log left-brace t slash left-parenthesis 1 minus t right-parenthesis right-brace. The mean function of a generalized linear model with logit link and a single regressor can thus be written as

    StartLayout 1st Row 1st Column log left-brace StartFraction mu Over 1 minus mu EndFraction right-brace 2nd Column equals beta 0 plus beta 1 x 2nd Row 1st Column mu 2nd Column equals StartFraction 1 Over 1 plus exp left-brace minus beta 0 minus beta 1 x right-brace EndFraction EndLayout

    This is known as a logistic regression model.

  • The random component of a generalized linear model is the distribution of the data, assumed to be a member of the exponential family of distributions. Discrete members of this family include the Bernoulli (binary), binomial, Poisson, geometric, and negative binomial (for a given value of the scale parameter) distribution. Continuous members include the normal (Gaussian), beta, gamma, inverse Gaussian, and exponential distribution.

The standard linear model with normally distributed error is a special case of a generalized linear model; the link function is the identity function and the distribution is normal.

Last updated: December 09, 2022