The LOGISTIC Procedure

Goodness-of-Fit Tests

The following subsection describes the standard Pearson and deviance goodness-of-fit tests. To be valid, these two tests require sufficient replication within subpopulations. When there are continuous predictors in the model, the data are often too sparse to use these statistics. The remaining subsections describe tests that are all designed to be valid in such situations.

Let N be the number of observations in your data, and let m denote the number of subpopulation profiles. You can use the AGGREGATE (or AGGREGATE=) option in the MODEL statement to define the subpopulation profiles. If you omit this option, each observation is regarded as coming from a separate subpopulation, and m=N.

Let be the number of response levels and p be the number of parameters that are estimated. For the jth profile (or observation) and the ith response level, let be the total weight (sum of the product of the frequencies and the weights of the observations within that profile), let , and let be the fitted probability. Let be the number of responses with level i, and let be the number of trials. Let denote the predicted probability of response level i, .

Pearson and Deviance Goodness-of-Fit Test

The Pearson chi-square statistic and the deviance are given by

StartLayout 1st Row 1st Column chi Subscript normal upper P Superscript 2 2nd Column equals 3rd Column sigma-summation Underscript j equals 1 Overscript m Endscripts sigma-summation Underscript i equals 1 Overscript k plus 1 Endscripts StartFraction left-parenthesis r Subscript i j Baseline minus n Subscript j Baseline ModifyingAbove pi With caret Subscript i j Baseline right-parenthesis squared Over n Subscript j Baseline ModifyingAbove pi With caret Subscript i j Baseline EndFraction 2nd Row 1st Column chi Subscript normal upper D Superscript 2 2nd Column equals 3rd Column 2 sigma-summation Underscript j equals 1 Overscript m Endscripts sigma-summation Underscript i equals 1 Overscript k plus 1 Endscripts r Subscript i j Baseline log left-parenthesis StartFraction r Subscript i j Baseline Over n Subscript j Baseline ModifyingAbove pi With caret Subscript i j Baseline EndFraction right-parenthesis EndLayout

Each of these chi-square statistics has degrees of freedom

A large difference between the Pearson statistic and the deviance provides some evidence that the data are too sparse to use either statistic.

Without the AGGREGATE (or AGGREGATE=) option, the Pearson chi-square statistic and the deviance are calculated only for events/trials syntax.

The Hosmer-Lemeshow Goodness-of-Fit Test

Sufficient replication within subpopulations is required to make the Pearson and deviance goodness-of-fit tests valid. When there are one or more continuous predictors in the model, the data are often too sparse to use these statistics. Hosmer and Lemeshow (2000) proposed a statistic that they show, through simulation, is distributed as chi-square when there is no replication in any of the subpopulations. Fagerland, Hosmer, and Bofin (2008) and Fagerland and Hosmer (2013, 2016) extend this test to polytomous response models.

The observations are sorted in increasing order of a scored value. For binary response variables, the scored value of an observation is its estimated event probability. The event is the response level specified in the response variable option EVENT=, the response level that is not specified in the REF= option, or, if neither of these options was specified, the response level identified in the "Response Profiles" table as "Ordered Value 1." For nominal response variables (LINK=GLOGIT), the scored value of an observation is 1 minus the estimated probability of the reference level (specified using the REF= option). For ordinal response variables, the scored value of an observation is , where K is the number of response levels and is the predicted probability of the ith ordered response.

The observations (and frequencies) are then combined into G groups. By default G = 10, but you can specify with the NGROUPS= suboption of the LACKFIT option in the MODEL statement. For single-trial syntax, observations with identical scored values are combined and are placed in the same group. Let F be the total frequency. The target frequency for each group is , which is the integer part of . Load the first group () with the observation that has the smallest scored value and with frequency , and let the next-smallest observation have a frequency of f. PROC LOGISTIC performs the following steps for each observation to create the groups:

If , then add this observation to group .
Otherwise, if and , then add this observation to group .
Otherwise, start loading the next group () with , and set .

If the final group has frequency , then add these observations to the preceding group. The total number of groups actually created, g, can be less than G. There must be at least three groups in order for the Hosmer-Lemeshow statistic to be computed.

For binary response variables, the Hosmer-Lemeshow goodness-of-fit statistic is obtained by calculating the Pearson chi-square statistic from the table of observed and expected frequencies, where g is the number of groups. The statistic is written

chi Subscript normal upper H normal upper L Superscript 2 Baseline equals sigma-summation Underscript j equals 1 Overscript g Endscripts StartFraction left-parenthesis upper O Subscript j Baseline minus upper F Subscript j Baseline pi overbar Subscript j Baseline right-parenthesis squared Over upper F Subscript j Baseline pi overbar Subscript j Baseline left-parenthesis 1 minus pi overbar Subscript j Baseline right-parenthesis EndFraction

where is the total frequency of subjects in the jth group, is the total frequency of event outcomes in the jth group, and is the average estimated predicted probability of an event outcome for the jth group. (Note that the predicted probabilities are computed as shown in the section Linear Predictor, Predicted Probability, and Confidence Limits and are not the cross validated estimates discussed in the section Classification Table.) The Hosmer-Lemeshow statistic is then compared to a chi-square distribution with degrees of freedom, where the value of r can be specified in the DFREDUCE= suboption of the LACKFIT option in the MODEL statement. The default is r = 2.

For polytomous response variables, the Pearson chi-square statistic is computed from a table of observed and expected frequencies,

chi Subscript normal upper H normal upper L Superscript 2 Baseline equals sigma-summation Underscript j equals 1 Overscript g Endscripts sigma-summation Underscript k equals 1 Overscript upper K Endscripts StartFraction left-parenthesis upper O Subscript j k Baseline minus upper E Subscript j k Baseline right-parenthesis squared Over upper E Subscript j k Baseline EndFraction

where is the sum of the observed frequencies and is the sum of the model predicted probabilities of the observations in group j with response k. The Hosmer-Lemeshow statistic is then compared to a chi-square distribution. The number of degrees of freedom for this test of cumulative and adjacent-category logit models with the equal-slopes assumption is given by Fagerland and Hosmer (2013) and Fagerland and Hosmer (2016) as (g–r)(K–1)+(K–2); PROC LOGISTIC uses this number for all models that make the equal-slopes assumption. The number of degrees of freedom for this test of the generalized logit model is given by Fagerland, Hosmer, and Bofin (2008) as (g–r)(K–1), where K is the number of response levels; PROC LOGISTIC uses this number for all models that do not make the equal-slopes assumption. The degrees of freedom can also be specified using the DF= suboption of the LACKFIT option in the MODEL statement.

Large values of (and small p-values) indicate a lack of fit of the model.

Goodness-of-Fit Tests with Sparse Data

The tests in this section are valid even when the data are sparse and there is very little or no replication in the data. These tests are currently available only for binary logistic regression models, and they are reported in the "Goodness-of-Fit Tests" table when you specify the GOF option in the MODEL statement. Let denote the predicted event probability, and let be the covariance matrix for the fitted model.

Information Matrix Test

The general misspecification test of White (1982) is applied by Orme (1988) to binary response data. The design vector for observation j is expanded by the upper triangular matrix of ; that is, by . The are scaled by , the values are scaled by , and new response values are created by using the binary form of the residual :

upper Y Superscript asterisk Baseline equals StartFraction e Subscript j Baseline Over StartRoot ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis EndRoot EndFraction

The model sum of squares from a linear regression of against the expanded set of covariates has a chi-square distribution with degrees of freedom equal to the number of scaled covariates that are nondegenerate. This test is labeled "Information Matrix" in the "Goodness-of-Fit Tests" table.

Kuss (2002) modifies this test by expanding the design matrix using only the diagonal values of . This test is labeled "Information Matrix Diagonal" in the "Goodness-of-Fit Tests" table.

Osius-Rojek Test

Osius and Rojek (1992) use fixed-cells asymptotics to derive the mean and variance of the Pearson chi-square statistic. The mean is m, the number of subpopulation profiles, and the variance is as follows:

sigma squared equals 2 m plus sigma-summation Underscript j equals 1 Overscript m Endscripts StartFraction 1 Over w Subscript j Baseline n Subscript j Baseline EndFraction left-parenthesis StartFraction 1 Over ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis EndFraction minus 6 right-parenthesis minus bold c prime ModifyingAbove bold upper V With caret bold c where bold c equals sigma-summation Underscript j equals 1 Overscript m Endscripts left-parenthesis 1 minus 2 ModifyingAbove pi With caret Subscript j Baseline right-parenthesis bold x Subscript j Baseline

Standardize the Pearson statistic, , then square it to obtain a test.

Unweighted Residual Sum-of-Squares Test

Copas (1989) bases a test on the asymptotic normal distribution of the numerator of the Pearson chi-square statistic

upper S equals sigma-summation Underscript j equals 1 Overscript upper N Endscripts sigma-summation Underscript i equals 1 Overscript 2 Endscripts w Subscript i j Baseline left-parenthesis r Subscript i j Baseline minus n Subscript j Baseline ModifyingAbove pi With caret Subscript i j Baseline right-parenthesis squared

which has mean . Hosmer et al. (1997) simplify the distribution of for binary response models, so that its variance, , is the residual sum of squares of a regression of a vector with entries on with weights . Standardize the statistic, , then square it to obtain a test.

Spiegelhalter Test

Spiegelhalter (1986) derives a test based on the Brier score, written in binary form as

upper B equals StartFraction 1 Over upper W EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline left-parenthesis y Subscript j Baseline minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis squared

where . B is asymptotically normal with

upper E left-parenthesis upper B right-parenthesis equals StartFraction 1 Over upper W EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis

and

Var left-parenthesis upper B right-parenthesis equals StartFraction 1 Over upper W squared EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts w Subscript j Baseline left-parenthesis 1 minus 2 ModifyingAbove pi With caret Subscript j Baseline right-parenthesis squared ModifyingAbove pi With caret Subscript j Baseline left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis

Standardize the Brier score, , then square it to obtain a test.

Stukel Test

Stukel (1988) adds two covariates to the model and tests that they are insignificant. For a binary response, where and is the indicator function, add the following covariates:

StartLayout 1st Row 1st Column z Subscript a 2nd Column equals 3rd Column eta Subscript j Superscript 2 Baseline upper I left-parenthesis eta Subscript j Baseline greater-than-or-equal-to 0 right-parenthesis 2nd Row 1st Column z Subscript b 2nd Column equals 3rd Column eta Subscript j Superscript 2 Baseline upper I left-parenthesis eta Subscript j Baseline less-than 0 right-parenthesis EndLayout

Then use a score test (see the section Score Statistics and Tests) to test whether this larger model is significantly different from the fitted model.

Last updated: December 09, 2022