Introduction to Regression Procedures

Generalized Linear Regression

As outlined in the section Generalized Linear Models in ChapterĀ 3, Introduction to Statistical Modeling with SAS/STAT Software, the class of generalized linear models generalizes the linear regression models in two ways:

  • by allowing the data to come from a distribution that is a member of the exponential family of distributions

  • by introducing a link function, g left-parenthesis dot right-parenthesis, that provides a mapping between the linear predictor eta equals bold x prime bold-italic beta and the mean of the data, g left-parenthesis normal upper E left-bracket upper Y right-bracket right-parenthesis equals eta. The link function is monotonic, so that normal upper E left-bracket upper Y right-bracket equals g Superscript negative 1 Baseline left-parenthesis eta right-parenthesis; g Superscript negative 1 Baseline left-parenthesis dot right-parenthesis is called the inverse link function.

One of the most commonly used generalized linear regression models is the logistic model for binary or binomial data. Suppose that Y denotes a binary outcome variable that takes on the values 1 and 0 with the probabilities pi and 1 minus pi, respectively. The probability pi is also referred to as the "success probability," supposing that the coding upper Y equals 1 corresponds to a success in a Bernoulli experiment. The success probability is also the mean of Y, and one of the aims of logistic regression analysis is to study how regressor variables affect the outcome probabilities or functions thereof, such as odds ratios.

The logistic regression model for pi is defined by the linear predictor eta equals bold x prime bold-italic beta and the logit link function:

normal l normal o normal g normal i normal t left-parenthesis normal upper P normal r left-parenthesis upper Y equals 0 right-parenthesis right-parenthesis equals log left-parenthesis StartFraction pi Over 1 minus pi EndFraction right-parenthesis equals bold x prime bold-italic beta

The inversely linked linear predictor function in this model is

normal upper P normal r left-parenthesis upper Y equals 0 right-parenthesis equals StartFraction 1 Over 1 plus exp left-parenthesis negative eta right-parenthesis EndFraction

The dichotomous logistic regression model can be extended to multinomial (polychotomous) data. Two classes of models for multinomial data can be fit by using procedures in SAS/STAT software: models for ordinal data that rely on cumulative link functions, and models for nominal (unordered) outcomes that rely on generalized logits. The next sections briefly discuss SAS/STAT procedures for logistic regression. For more information about the comparison of the procedures mentioned there with respect to analysis of categorical responses, see ChapterĀ 9, Introduction to Categorical Data Analysis Procedures.

The SAS/STAT procedures CATMOD, GENMOD, GLIMMIX, LOGISTIC, and PROBIT can fit generalized linear models for binary, binomial, and multinomial outcomes.

Last updated: December 09, 2022