The LOGISTIC Procedure

Overdispersion

This section uses the definitions that are provided in the section Goodness-of-Fit Tests.

For a correctly specified model, the Pearson chi-square statistic and the deviance, divided by their degrees of freedom, should be approximately equal to one. When their values are much larger than one, the assumption of binomial variability might not be valid and the data are said to exhibit overdispersion. Underdispersion, which results in the ratios being less than one, occurs less often in practice.

When fitting a model, there are several problems that can cause the goodness-of-fit statistics to exceed their degrees of freedom. Among these are such problems as outliers in the data, using the wrong link function, omitting important terms from the model, and needing to transform some predictors. These problems should be eliminated before proceeding to use the following methods to correct for overdispersion.

Rescaling the Covariance Matrix

One way of correcting overdispersion is to multiply the covariance matrix by a dispersion parameter. This method assumes that the sample sizes in each subpopulation are approximately equal. You can supply the value of the dispersion parameter directly, or you can estimate the dispersion parameter based on either the Pearson chi-square statistic or the deviance for the fitted model.

The dispersion parameter is estimated by

ModifyingAbove sigma squared With caret equals StartLayout Enlarged left-brace 1st Row 1st Column chi Subscript upper P Superscript 2 Baseline slash left-parenthesis m k minus p right-parenthesis 2nd Column SCALE equals PEARSON 2nd Row 1st Column chi Subscript upper D Superscript 2 Baseline slash left-parenthesis m k minus p right-parenthesis 2nd Column SCALE equals DEVIANCE 3rd Row 1st Column left-parenthesis c o n s t a n t right-parenthesis squared 2nd Column SCALE equals c o n s t a n t EndLayout

In order for the Pearson statistic and the deviance to be distributed as chi-square, there must be sufficient replication within the subpopulations. When this is not true, the data are sparse, and the p-values for these statistics are not valid and should be ignored. Similarly, these statistics, divided by their degrees of freedom, cannot serve as indicators of overdispersion. A large difference between the Pearson statistic and the deviance provides some evidence that the data are too sparse to use either statistic.

Note that the parameter estimates are not changed by this method. However, their standard errors are adjusted for overdispersion, affecting their significance tests.

Williams’ Method

Suppose that the data consist of n binomial observations. For the jth observation, let r Subscript j Baseline slash n Subscript j be the observed proportion and let bold x Subscript j be the associated vector of explanatory variables. Suppose that the response probability for the jth observation is a random variable upper P Subscript j with mean and variance

upper E left-parenthesis upper P Subscript j Baseline right-parenthesis equals pi Subscript j Baseline and upper V left-parenthesis upper P Subscript j Baseline right-parenthesis equals phi pi Subscript j Baseline left-parenthesis 1 minus pi Subscript j Baseline right-parenthesis

where p Subscript j is the probability of the event, and phi is a nonnegative but otherwise unknown scale parameter. Then the mean and variance of r Subscript j are

upper E left-parenthesis r Subscript j Baseline right-parenthesis equals n Subscript j Baseline pi Subscript j Baseline and upper V left-parenthesis r Subscript j Baseline right-parenthesis equals n Subscript i Baseline j pi Subscript j Baseline left-parenthesis 1 minus pi Subscript j Baseline right-parenthesis left-bracket 1 plus left-parenthesis n Subscript j Baseline minus 1 right-parenthesis phi right-bracket

Williams (1982) estimates the unknown parameter phi by equating the value of Pearson’s chi-square statistic for the full model to its approximate expected value. Suppose w Subscript j Superscript asterisk is the weight associated with the jth observation. The Pearson chi-square statistic is given by

chi squared equals sigma-summation Underscript j equals 1 Overscript n Endscripts StartFraction w Subscript j Superscript asterisk Baseline left-parenthesis r Subscript j Baseline minus n Subscript j Baseline ModifyingAbove pi With caret Subscript j Baseline right-parenthesis squared Over n Subscript j Baseline ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis EndFraction

Let g prime left-parenthesis dot right-parenthesis be the first derivative of the link function g left-parenthesis dot right-parenthesis. The approximate expected value of chi squared is

upper E Subscript chi squared Baseline equals sigma-summation Underscript j equals 1 Overscript n Endscripts w Subscript j Superscript asterisk Baseline left-parenthesis 1 minus w Subscript j Superscript asterisk Baseline v Subscript j Baseline d Subscript j Baseline right-parenthesis left-bracket 1 plus phi left-parenthesis n Subscript j Baseline minus 1 right-parenthesis right-bracket

where v Subscript j Baseline equals n Subscript j Baseline slash left-parenthesis pi Subscript j Baseline left-parenthesis 1 minus pi Subscript j Baseline right-parenthesis left-bracket g prime left-parenthesis pi Subscript j Baseline right-parenthesis right-bracket squared right-parenthesis and d Subscript j is the variance of the linear predictor ModifyingAbove alpha With caret Subscript j Baseline plus bold x prime Subscript j Baseline ModifyingAbove bold-italic beta With caret. The scale parameter phi is estimated by the following iterative procedure.

At the start, let w Subscript j Superscript asterisk Baseline equals 1 and let pi Subscript j be approximated by r Subscript j Baseline slash n Subscript j, j equals 1 comma 2 comma ellipsis comma n. If you apply these weights and approximated probabilities to chi squared and upper E Subscript chi squared and then equate them, an initial estimate of phi is

ModifyingAbove phi With caret Subscript 0 Baseline equals StartFraction chi squared minus left-parenthesis n minus p right-parenthesis Over sigma-summation Underscript j Endscripts left-parenthesis n Subscript j Baseline minus 1 right-parenthesis left-parenthesis 1 minus v Subscript j Baseline d Subscript j Baseline right-parenthesis EndFraction

where p is the total number of parameters. The initial estimates of the weights become ModifyingAbove w With caret Subscript j Baseline 0 Superscript asterisk Baseline equals left-bracket 1 plus left-parenthesis n Subscript j Baseline minus 1 right-parenthesis ModifyingAbove phi With caret Subscript 0 Baseline right-bracket Superscript negative 1. After a weighted fit of the model, the ModifyingAbove alpha With caret Subscript j and ModifyingAbove bold-italic beta With caret are recalculated, and so is chi squared. Then a revised estimate of phi is given by

ModifyingAbove phi With caret Subscript 1 Baseline equals StartFraction chi squared minus sigma-summation Underscript j Endscripts w Subscript j Superscript asterisk Baseline left-parenthesis 1 minus w Subscript j Superscript asterisk Baseline v Subscript j Baseline d Subscript j Baseline right-parenthesis Over w Subscript j Superscript asterisk Baseline left-parenthesis n Subscript j Baseline minus 1 right-parenthesis left-parenthesis 1 minus w Subscript j Superscript asterisk Baseline v Subscript j Baseline d Subscript j Baseline right-parenthesis EndFraction

The iterative procedure is repeated until chi squared is very close to its degrees of freedom.

Once phi has been estimated by ModifyingAbove phi With caret under the full model, weights of left-parenthesis 1 plus left-parenthesis n Subscript j Baseline minus 1 right-parenthesis ModifyingAbove phi With caret right-parenthesis Superscript negative 1 can be used to fit models that have fewer terms than the full model. See Example 79.10 for an illustration.

Note: If the WEIGHT statement is specified with the NORMALIZE option, then the initial w Subscript j Superscript asterisk values are set to the normalized weights, and the weights resulting from Williams’ method will not add up to the actual sample size. However, the estimated covariance matrix of the parameter estimates remains invariant to the scale of the WEIGHT variable.

Last updated: December 09, 2022