The LOGISTIC Procedure

Goodness-of-Fit Tests

The following subsection describes the standard Pearson and deviance goodness-of-fit tests. To be valid, these two tests require sufficient replication within subpopulations. When there are continuous predictors in the model, the data are often too sparse to use these statistics. The remaining subsections describe tests that are all designed to be valid in such situations.

Let N be the number of observations in your data, and let m denote the number of subpopulation profiles. You can use the AGGREGATE (or AGGREGATE=) option in the MODEL statement to define the subpopulation profiles. If you omit this option, each observation is regarded as coming from a separate subpopulation, and m=N.

Let k plus 1 be the number of response levels and p be the number of parameters that are estimated. For the jth profile (or observation) and the ith response level, let w Subscript i j be the total weight (sum of the product of the frequencies and the weights of the observations within that profile), let w Subscript j Baseline equals sigma-summation Underscript i equals 1 Overscript k plus 1 Endscripts w Subscript i j, and let ModifyingAbove pi With caret Subscript i j be the fitted probability. Let r Subscript i j be the number of responses with level i, and let n Subscript j Baseline equals sigma-summation Underscript i Endscripts r Subscript i j be the number of trials. Let ModifyingAbove pi With caret Subscript i j denote the predicted probability of response level i, 1 less-than-or-equal-to i less-than-or-equal-to k plus 1.

Pearson and Deviance Goodness-of-Fit Test

The Pearson chi-square statistic chi Subscript normal upper P Superscript 2 and the deviance chi Subscript normal upper D Superscript 2 are given by

StartLayout 1st Row 1st Column chi Subscript normal upper P Superscript 2 2nd Column equals 3rd Column sigma-summation Underscript j equals 1 Overscript m Endscripts sigma-summation Underscript i equals 1 Overscript k plus 1 Endscripts StartFraction left-parenthesis r Subscript i j Baseline minus n Subscript j Baseline ModifyingAbove pi With caret Subscript i j Baseline right-parenthesis squared Over n Subscript j Baseline ModifyingAbove pi With caret Subscript i j Baseline EndFraction 2nd Row 1st Column chi Subscript normal upper D Superscript 2 2nd Column equals 3rd Column 2 sigma-summation Underscript j equals 1 Overscript m Endscripts sigma-summation Underscript i equals 1 Overscript k plus 1 Endscripts r Subscript i j Baseline log left-parenthesis StartFraction r Subscript i j Baseline Over n Subscript j Baseline ModifyingAbove pi With caret Subscript i j Baseline EndFraction right-parenthesis EndLayout

Each of these chi-square statistics has m k minus p degrees of freedom

A large difference between the Pearson statistic and the deviance provides some evidence that the data are too sparse to use either statistic.

Without the AGGREGATE (or AGGREGATE=) option, the Pearson chi-square statistic and the deviance are calculated only for events/trials syntax.

The Hosmer-Lemeshow Goodness-of-Fit Test

Sufficient replication within subpopulations is required to make the Pearson and deviance goodness-of-fit tests valid. When there are one or more continuous predictors in the model, the data are often too sparse to use these statistics. Hosmer and Lemeshow (2000) proposed a statistic that they show, through simulation, is distributed as chi-square when there is no replication in any of the subpopulations. Fagerland, Hosmer, and Bofin (2008) and Fagerland and Hosmer (2013, 2016) extend this test to polytomous response models.

The observations are sorted in increasing order of a scored value. For binary response variables, the scored value of an observation is its estimated event probability. The event is the response level specified in the response variable option EVENT=, the response level that is not specified in the REF= option, or, if neither of these options was specified, the response level identified in the "Response Profiles" table as "Ordered Value 1." For nominal response variables (LINK=GLOGIT), the scored value of an observation is 1 minus the estimated probability of the reference level (specified using the REF= option). For ordinal response variables, the scored value of an observation is sigma-summation Underscript i equals 1 Overscript upper K Endscripts i ModifyingAbove pi With caret Subscript i, where K is the number of response levels and ModifyingAbove pi With caret Subscript i is the predicted probability of the ith ordered response.

The observations (and frequencies) are then combined into G groups. By default G = 10, but you can specify upper G greater-than-or-equal-to 5 with the NGROUPS= suboption of the LACKFIT option in the MODEL statement. For single-trial syntax, observations with identical scored values are combined and are placed in the same group. Let F be the total frequency. The target frequency for each group is upper T equals left floor upper F slash upper G plus 0.5 right floor, which is the integer part of upper F slash upper G plus 0.5. Load the first group (g Subscript j Baseline comma j equals 1) with the observation that has the smallest scored value and with frequency f 1, and let the next-smallest observation have a frequency of f. PROC LOGISTIC performs the following steps for each observation to create the groups:

  1. If j equals upper G, then add this observation to group g Subscript j.

  2. Otherwise, if f Subscript j Baseline less-than upper T and f Subscript j Baseline plus left floor f slash 2 right floor less-than-or-equal-to upper T, then add this observation to group g Subscript j.

  3. Otherwise, start loading the next group (g Subscript j plus 1) with f Subscript j plus 1 Baseline equals f, and set j equals j plus 1.

If the final group g Subscript j has frequency f Subscript j Baseline less-than StartFraction upper F Over 2 upper G EndFraction, then add these observations to the preceding group. The total number of groups actually created, g, can be less than G. There must be at least three groups in order for the Hosmer-Lemeshow statistic to be computed.

For binary response variables, the Hosmer-Lemeshow goodness-of-fit statistic is obtained by calculating the Pearson chi-square statistic from the 2 times g table of observed and expected frequencies, where g is the number of groups. The statistic is written

chi Subscript normal upper H normal upper L Superscript 2 Baseline equals sigma-summation Underscript j equals 1 Overscript g Endscripts StartFraction left-parenthesis upper O Subscript j Baseline minus upper F Subscript j Baseline pi overbar Subscript j Baseline right-parenthesis squared Over upper F Subscript j Baseline pi overbar Subscript j Baseline left-parenthesis 1 minus pi overbar Subscript j Baseline right-parenthesis EndFraction

where upper F Subscript j is the total frequency of subjects in the jth group, upper O Subscript j is the total frequency of event outcomes in the jth group, and pi overbar Subscript j is the average estimated predicted probability of an event outcome for the jth group. (Note that the predicted probabilities are computed as shown in the section Linear Predictor, Predicted Probability, and Confidence Limits and are not the cross validated estimates discussed in the section Classification Table.) The Hosmer-Lemeshow statistic is then compared to a chi-square distribution with left-parenthesis g minus r right-parenthesis degrees of freedom, where the value of r can be specified in the DFREDUCE= suboption of the LACKFIT option in the MODEL statement. The default is r = 2.

For polytomous response variables, the Pearson chi-square statistic is computed from a 2 upper K times g table of observed and expected frequencies,

chi Subscript normal upper H normal upper L Superscript 2 Baseline equals sigma-summation Underscript j equals 1 Overscript g Endscripts sigma-summation Underscript k equals 1 Overscript upper K Endscripts StartFraction left-parenthesis upper O Subscript j k Baseline minus upper E Subscript j k Baseline right-parenthesis squared Over upper E Subscript j k Baseline EndFraction

where upper O Subscript j k is the sum of the observed frequencies and upper E Subscript j k is the sum of the model predicted probabilities of the observations in group j with response k. The Hosmer-Lemeshow statistic is then compared to a chi-square distribution. The number of degrees of freedom for this test of cumulative and adjacent-category logit models with the equal-slopes assumption is given by Fagerland and Hosmer (2013) and Fagerland and Hosmer (2016) as (gr)(K–1)+(K–2); PROC LOGISTIC uses this number for all models that make the equal-slopes assumption. The number of degrees of freedom for this test of the generalized logit model is given by Fagerland, Hosmer, and Bofin (2008) as (gr)(K–1), where K is the number of response levels; PROC LOGISTIC uses this number for all models that do not make the equal-slopes assumption. The degrees of freedom can also be specified using the DF= suboption of the LACKFIT option in the MODEL statement.

Large values of chi Subscript normal upper H normal upper L Superscript 2 (and small p-values) indicate a lack of fit of the model.

Goodness-of-Fit Tests with Sparse Data

The tests in this section are valid even when the data are sparse and there is very little or no replication in the data. These tests are currently available only for binary logistic regression models, and they are reported in the "Goodness-of-Fit Tests" table when you specify the GOF option in the MODEL statement. Let ModifyingAbove pi With caret Subscript j denote the predicted event probability, and let ModifyingAbove bold upper V With caret be the covariance matrix for the fitted model.

Information Matrix Test

The general misspecification test of White (1982) is applied by Orme (1988) to binary response data. The design vector bold x Subscript j for observation j is expanded by the upper triangular matrix of bold x Subscript j Baseline bold x prime Subscript j; that is, by vech left-parenthesis bold x Subscript j Baseline right-parenthesis. The bold x Subscript j are scaled by StartRoot ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis EndRoot, the vech left-parenthesis bold x Subscript j Baseline right-parenthesis values are scaled by StartRoot ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis EndRoot left-parenthesis 1 minus 2 ModifyingAbove pi With caret Subscript j Baseline right-parenthesis, and new response values are created by using the binary form of the residual e Subscript j Baseline equals y Subscript j Baseline minus ModifyingAbove pi With caret Subscript j:

upper Y Superscript asterisk Baseline equals StartFraction e Subscript j Baseline Over StartRoot ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis EndRoot EndFraction

The model sum of squares from a linear regression of upper Y Superscript asterisk against the expanded set of covariates has a chi-square distribution with degrees of freedom equal to the number of scaled vech left-parenthesis bold x Subscript j Baseline right-parenthesis covariates that are nondegenerate. This test is labeled "Information Matrix" in the "Goodness-of-Fit Tests" table.

Kuss (2002) modifies this test by expanding the design matrix using only the diagonal values of bold x Subscript j Baseline bold x prime Subscript j. This test is labeled "Information Matrix Diagonal" in the "Goodness-of-Fit Tests" table.

Osius-Rojek Test

Osius and Rojek (1992) use fixed-cells asymptotics to derive the mean and variance of the Pearson chi-square statistic. The mean is m, the number of subpopulation profiles, and the variance is as follows:

sigma squared equals 2 m plus sigma-summation Underscript j equals 1 Overscript m Endscripts StartFraction 1 Over w Subscript j Baseline n Subscript j Baseline EndFraction left-parenthesis StartFraction 1 Over ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis EndFraction minus 6 right-parenthesis minus bold c prime ModifyingAbove bold upper V With caret bold c where bold c equals sigma-summation Underscript j equals 1 Overscript m Endscripts left-parenthesis 1 minus 2 ModifyingAbove pi With caret Subscript j Baseline right-parenthesis bold x Subscript j Baseline

Standardize the Pearson statistic, StartFraction chi Subscript normal upper P Superscript 2 Baseline minus m Over sigma EndFraction, then square it to obtain a chi 1 squared test.

Unweighted Residual Sum-of-Squares Test

Copas (1989) bases a test on the asymptotic normal distribution of the numerator of the Pearson chi-square statistic

upper S equals sigma-summation Underscript j equals 1 Overscript upper N Endscripts sigma-summation Underscript i equals 1 Overscript 2 Endscripts w Subscript i j Baseline left-parenthesis r Subscript i j Baseline minus n Subscript j Baseline ModifyingAbove pi With caret Subscript i j Baseline right-parenthesis squared

which has mean upper E left-parenthesis upper S right-parenthesis equals trace left-parenthesis ModifyingAbove bold upper V With caret right-parenthesis. Hosmer et al. (1997) simplify the distribution of upper S minus trace left-parenthesis ModifyingAbove bold upper V With caret right-parenthesis for binary response models, so that its variance, sigma squared, is the residual sum of squares of a regression of a vector with entries left-parenthesis 1 minus 2 ModifyingAbove pi With caret Subscript j Baseline right-parenthesis on bold upper X with weights ModifyingAbove bold upper V With caret. Standardize the statistic, StartFraction upper S minus trace left-parenthesis ModifyingAbove bold upper V With caret right-parenthesis Over sigma EndFraction, then square it to obtain a chi 1 squared test.

Spiegelhalter Test

Spiegelhalter (1986) derives a test based on the Brier score, written in binary form as

upper B equals StartFraction 1 Over upper W EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline left-parenthesis y Subscript j Baseline minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis squared

where upper W equals sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j. B is asymptotically normal with

upper E left-parenthesis upper B right-parenthesis equals StartFraction 1 Over upper W EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis

and

Var left-parenthesis upper B right-parenthesis equals StartFraction 1 Over upper W squared EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline left-parenthesis 1 minus 2 ModifyingAbove pi With caret Subscript j Baseline right-parenthesis squared ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis

Standardize the Brier score, StartFraction upper B minus upper E left-parenthesis upper B right-parenthesis Over StartRoot Var left-parenthesis upper B right-parenthesis EndRoot EndFraction, then square it to obtain a chi 1 squared test.

Stukel Test

Stukel (1988) adds two covariates to the model and tests that they are insignificant. For a binary response, where eta Subscript j Baseline equals bold x prime Subscript j Baseline bold-italic beta and upper I left-parenthesis dot right-parenthesis is the indicator function, add the following covariates:

StartLayout 1st Row 1st Column z Subscript a 2nd Column equals 3rd Column eta Subscript j Superscript 2 Baseline upper I left-parenthesis eta Subscript j Baseline greater-than-or-equal-to 0 right-parenthesis 2nd Row 1st Column z Subscript b 2nd Column equals 3rd Column eta Subscript j Superscript 2 Baseline upper I left-parenthesis eta Subscript j Baseline less-than 0 right-parenthesis EndLayout

Then use a score test (see the section Score Statistics and Tests) to test whether this larger model is significantly different from the fitted model.

Last updated: December 09, 2022