The LOGISTIC Procedure

Model Fitting Information

The statistics that this section discusses are useful for comparing competing models that are not necessarily nested. They are different measures of how well your model is fitted to your data.

Let N be the number of observations in your data. For the jth observation, let be the number of events and be the number of trials when events/trials syntax is specified or for single-trial syntax. Let and be the weight and frequency values, respectively, and denote and . The total sample size is .

Let p denote the number of parameters in the model, including the intercept parameters. Let s be the number of explanatory effects—that is, the number of slope parameters. Let k be the total number of response functions; this is the same as the number of intercepts in the model unless you specify the NOINT option in the MODEL statement. For this section, assume that the NOINT option is not specified. For binary and cumulative response models, . For the generalized logit model, .

Information Criteria

For the jth observation, let be the estimated probability of the observed response. The criteria that the LOGISTIC procedure displays are calculated as follows:

–2 log likelihood:

where is the dispersion parameter, which equals 1 unless the SCALE= option is specified. For binary response models that use events/trials MODEL statement syntax, this is

where denotes the estimated event probability. This statistic is reported both with and without the constant term.
Akaike’s information criterion:
Schwarz (Bayesian information) criterion:
Akaike’s corrected information criterion:

The AIC and SC statistics give two different ways of adjusting the –2 log-likelihood statistic for the number of terms in the model and the number of observations used. You can use these statistics to compare different models for the same data (for example, when you use the SELECTION=STEPWISE option in the MODEL statement). The models that you are comparing do not have to be nested; lower values of the statistics indicate a more desirable model.

The AICC is a small-sample, bias-corrected version of Akaike’s information criterion, as promoted in Hurvich and Tsai (1989) and Burnham and Anderson (1998), for example. The AICC is displayed in the "Model Fit Statistics" table for the selected model when you specify the GOF option in the MODEL statement, and in the "Fit Statistics for SCORE Data" table when you specify the FITSTAT option in the SCORE statement.

The difference in the –2 log-likelihood statistics between the intercepts-only model and the specified model has a degree-of-freedom chi-square distribution under the null hypothesis that all the explanatory effects in the model are zero. The likelihood ratio test in the "Testing Global Null Hypothesis: BETA=0" table displays this difference and the associated p-value for this statistic. The score and Wald tests in that table test the same hypothesis and are asymptotically equivalent; for more information, see the sections Residual Chi-Square and Testing Linear Hypotheses about the Regression Coefficients.

Generalized Coefficient of Determination

Like the AIC and SC statistics that are described in the section Information Criteria, R-square statistics are most useful for comparing competing models that are not necessarily nested—larger values indicate better models. The statistics that are discussed in this section are based on the likelihood of the fitted model. Specifying the NORMALIZE option in the WEIGHT statement makes these coefficients invariant to the scale of the weights.

Let L denote the likelihood of the intercept-only model, and let L denote the likelihood of the specified model.

Maddala (1983) and Cox and Snell (1989, pp. 208–209) propose the following generalization of the coefficient of determination to a more general linear model:

upper R squared equals 1 minus left-parenthesis StartFraction upper L 0 Over upper L EndFraction right-parenthesis Superscript 2 slash upper W

The quantity achieves a maximum of less than one for discrete models, where the maximum is given by

upper R Subscript max Superscript 2 Baseline equals 1 minus left-parenthesis upper L 0 right-parenthesis Superscript 2 slash upper W

Cragg and Uhler (1970), Maddala (1983), and Nagelkerke (1991) propose the following adjusted coefficient, which can achieve a maximum value of 1:

upper R Subscript normal upper N Superscript 2 Baseline equals StartFraction upper R squared Over upper R Subscript max Superscript 2 Baseline EndFraction

The RSQUARE option in the MODEL statement displays , labeled as "R-Square," and , labeled as "Max-rescaled R-Square," in the "RSquare" table. The GOF option in the MODEL statement displays these two statistics in the "Model Fit Statistics" table. The FITSTAT option in the SCORE statement displays them in the "Fit Statistics for SCORE Data" table.

You produce the remaining statistics in this section by specifying the GOF option when you have a binary logistic regression model. These statistics are displayed in the "Model Fit Statistics" table.

McFadden (1974) suggests a measure analogous to the R-square in the linear regression model, labeled as "McFadden’s R-Square," also called the likelihood ratio index or the deviance R-square:

upper R Subscript normal upper M Superscript 2 Baseline equals 1 minus StartFraction log upper L Over log upper L 0 EndFraction

Estrella (1998) devised an R-square measure, labeled "Estrella’s R-Square," which satisfies a requirement that its derivative equal 1 when its value is 0:

upper R Subscript normal upper E Superscript 2 Baseline equals 1 minus left-parenthesis 1 minus StartFraction 2 left-parenthesis log upper L minus log upper L 0 right-parenthesis Over minus 2 log upper L 0 EndFraction right-parenthesis Superscript minus 2 log left-parenthesis upper L 0 right-parenthesis slash upper W

Estrella also adjusts , in a measure labeled "Estrella’s Adjusted R-Square," by imposing a penalty for the number of parameters in the model in the same fashion as the AIC:

upper R Subscript normal upper E normal upper A Superscript 2 Baseline equals 1 minus left-parenthesis StartFraction log upper L minus p Over log upper L 0 EndFraction right-parenthesis Superscript minus 2 log left-parenthesis upper L 0 right-parenthesis slash upper W

Aldrich and Nelson (1984) based another measure on the model chi-square statistic that compares the full model to the intercept-only model:

upper R Subscript normal upper A normal upper N Superscript 2 Baseline equals StartFraction 2 left-parenthesis log upper L minus log upper L 0 right-parenthesis Over 2 left-parenthesis log upper L minus log upper L 0 right-parenthesis plus upper W EndFraction

Veall and Zimmermann (1996) adjust to obtain an upper limit of 1:

upper R Subscript normal upper V normal upper Z Superscript 2 Baseline equals StartFraction 2 left-parenthesis log upper L minus log upper L 0 right-parenthesis left-parenthesis minus 2 log upper L 0 plus upper W right-parenthesis Over minus 2 log upper L 0 left-parenthesis 2 left-parenthesis log upper L minus log upper L 0 right-parenthesis plus upper W right-parenthesis EndFraction

Discussions of these and other pseudo-R measures can also be found in Allison (2014), Menard (2000), Smith and McKenna (2013), and Windmeijer (1995).

Probability- and Residual-Based Model Fitting Criteria

Measures that are discussed in this section use residuals and predicted probabilities to evaluate the strength of the fit.

For binary response models, let denote the estimated event probability for observation j. For polytomous response models, let denote the predicted probability of response level i, . Let be the residual, which for binomial models is , for binary response models is , and for polytomous response models is .

Average Square Error and Brier Score

The average square error is computed as

ASE equals StartFraction 1 Over upper F EndFraction sigma-summation f Subscript j Baseline e Subscript j Superscript 2

If you have specified a WEIGHT statement, then the weighted average square error is

WASE equals StartFraction 1 Over upper W EndFraction sigma-summation f Subscript j Baseline w Subscript j Baseline e Subscript j Superscript 2

If you specify the FITSTAT option in the SCORE statement, then these statistics are displayed in the "Fit Statistics for SCORE Data" table, and they are labeled as "Brier Score" or "Brier Reliability," as discussed in the section Fit Statistics for Scored Data Sets. If you specify the GOF option and are fitting a binary logistic regression model, then these statistics are displayed in the "Model Fit Statistics" table.

Misclassification Rate

The misclassification rate, or error rate, is the proportion of observations that are incorrectly classified. By default, observations for which are classified as events; otherwise they are classified as nonevents.

If you specify the FITSTAT option in the SCORE statement, then the error rate is displayed in the "Fit Statistics for SCORE Data" table. You can change the cutpoint from 0.5 by specifying the PEVENT= option in the MODEL statement; the first value that you specify in that option is used.

If you specify the GOF option in the MODEL statement and are fitting a binary logistic regression model, then the misclassification rate is displayed in the "Model Fit Statistics" table. You can change the cutpoint from 0.5 by specifying the GOF(CUTPOINT=) option.

Lave and Efron’s OLS R-Square

Lave (1970) and Efron and Hinkley (1978) propose a statistic that is analogous to the general linear model ,

upper R Subscript normal upper E Superscript 2 Baseline equals 1 minus StartFraction sigma-summation f Subscript j Baseline w Subscript j Baseline e Subscript j Superscript 2 Baseline Over upper W 1 left-parenthesis 1 minus upper W 1 slash upper W right-parenthesis EndFraction

where is the total frequency*weight of the events. If you specify the GOF option in the MODEL statement and are fitting a binary logistic regression model, then this statistic is displayed in the "Model Fit Statistics" table.

Tjur’s R-Square

For a binary response model, write the mean of the model-predicted probabilities of event (Y=1) observations as

ModifyingAbove pi With caret overbar Subscript 1 Baseline equals StartFraction 1 Over upper F 1 EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts f Subscript j Baseline ModifyingAbove pi With caret Subscript j Baseline upper I left-parenthesis y Subscript j Baseline equals 1 right-parenthesis

and of the nonevent (Y=2) observations as

ModifyingAbove pi With caret overbar Subscript 2 Baseline equals StartFraction 1 Over upper F 2 EndFraction sigma-summation Underscript j equals 1 Overscript upper N Endscripts f Subscript j Baseline ModifyingAbove pi With caret Subscript j Baseline upper I left-parenthesis y Subscript j Baseline equals 2 right-parenthesis

where is the total frequency of events, is the total frequency of nonevents, and is the indicator function. Tjur (2009) defines the statistic

upper R Subscript normal upper T Superscript 2 Baseline equals ModifyingAbove pi With caret overbar Subscript 1 Baseline minus ModifyingAbove pi With caret overbar Subscript 2

and relates it to other R-square measures. If you specify the GOF option in the MODEL statement and are fitting a binary logistic regression model, then this statistic is displayed in the "Model Fit Statistics" table and labeled "Tjur’s R-Square." Tjur calls it the coefficient of discrimination because it is a measure of the model’s ability to distinguish between the event and nonevent distributions; it is called the "Mean Difference" and the "Difference of Means" in other SAS procedures. This statistic is the same as the or statistic (with unit standard error) that is discussed in the signal detection literature (McNicol 2005).

Rank Correlation of Observed Responses and Predicted Probabilities

The predicted mean score of an observation is the sum of the Ordered Values (shown in the "Response Profile" table) minus one, weighted by the corresponding predicted probabilities for that observation; that is, the predicted means score , where is the number of response levels and is the predicted probability of the ith (ordered) response.

A pair of observations with different observed responses is said to be concordant if the observation with the lower ordered response value has a lower predicted mean score than the observation with the higher ordered response value. If the observation with the lower ordered response value has a higher predicted mean score than the observation with the higher ordered response value, then the pair is discordant. If the pair is neither concordant nor discordant, it is a tie. If you have more than two response levels, enumeration of the total numbers of concordant and discordant pairs is carried out by categorizing the predicted mean score into intervals of length and accumulating the corresponding frequencies of observations. You can change the length of these intervals by specifying the BINWIDTH= option in the MODEL statement.

Let N be the sum of observation frequencies in the data. Suppose there are a total of t pairs with different responses: of them are concordant, of them are discordant, and of them are tied. PROC LOGISTIC computes the following four indices of rank correlation for assessing the predictive ability of a model:

StartLayout 1st Row 1st Column c 2nd Column equals 3rd Column left-parenthesis n Subscript c Baseline plus 0.5 left-parenthesis t minus n Subscript c Baseline minus n Subscript d Baseline right-parenthesis right-parenthesis slash t 2nd Row 1st Column Somers prime upper D left-parenthesis Gini coefficient right-parenthesis 2nd Column equals 3rd Column left-parenthesis n Subscript c Baseline minus n Subscript d Baseline right-parenthesis slash t 3rd Row 1st Column Goodman hyphen Kruskal gamma 2nd Column equals 3rd Column left-parenthesis n Subscript c Baseline minus n Subscript d Baseline right-parenthesis slash left-parenthesis n Subscript c Baseline plus n Subscript d Baseline right-parenthesis 4th Row 1st Column Kendall prime s tau hyphen a 2nd Column equals 3rd Column left-parenthesis n Subscript c Baseline minus n Subscript d Baseline right-parenthesis slash left-parenthesis 0.5 upper N left-parenthesis upper N minus 1 right-parenthesis right-parenthesis EndLayout

If there are no ties, then Somers’ D (Gini’s coefficient) . Note that the concordance index, c, also gives an estimate of the area under the receiver operating characteristic (ROC) curve when the response is binary (Hanley and McNeil 1982). See the section ROC Computations for more information about this area.

For binary responses, the predicted mean score is equal to the predicted probability for Ordered Value 2.

These statistics are not available when the STRATA statement is specified.

Last updated: December 09, 2022