The LOGISTIC Procedure

Classification Table

For binary response data, the response is either an event or a nonevent. In PROC LOGISTIC, the response with Ordered Value 1 is regarded as the event, and the response with Ordered Value 2 is the nonevent. PROC LOGISTIC models the probability of the event. From the fitted model, a predicted event probability can be computed for each observation. A method to compute a reduced-bias estimate of the predicted probability is given in the section Predicted Probability of an Event for Classification. If the (reduced-bias) predicted event probability exceeds or equals some cutpoint value z element-of left-bracket 0 comma 1 right-bracket, the observation is predicted to be an event observation; otherwise, it is predicted to be a nonevent observation.

Suppose that n 1 of n individuals experience an event, such as a disease, and the remaining n 2 equals n minus n 1 individuals do not experience that event (that is, they have a nonevent response). The 2 times 2 frequency (classification, confusion, decision, error) table in Table 12 is obtained by cross-classifying the observed and predicted responses, where n Subscript i j is the total number of observations that are observed to have Y = i and are classified into j. In this table, let Y = 1 denote an observed event and Y = 2 denote a nonevent, and let the decision rule, D, classify an observation as an event when ModifyingAbove pi With caret greater-than-or-equal-to z; D = 1 indicates that the observation is classified as an event, and D = 2 indicates that the observation is classified as a nonevent.

Table 12: Classification Matrix

upper D equals 1 (ModifyingAbove pi With caret greater-than-or-equal-to z) upper D equals 2 (ModifyingAbove pi With caret less-than z) Total
upper Y equals 1 (event) n 11 n 12 n 1
upper Y equals 2 (nonevent) n 21 n 22 n 2


The CTABLE option produces this table, and the PPROB= option selects one or more cutpoints z. Each cutpoint generates a classification table. If the PEVENT= option is also specified, a classification table is produced for each combination of PEVENT= and PPROB= values.

The cells of the classification matrix in Table 12 have the following interpretations:

n 11 the number of true positives, which is the number of event observations that are correctly classified as events
n 21 the number of false positives, which is the number of nonevent observations that are incorrectly classified as events
n 12 the number of false negatives, which is the number of event observations that are incorrectly classified as nonevents
n 22 the number of true negatives, which is the number of nonevent observations that are correctly classified as nonevents
n 1 the total number of actual events
n 2 the total number of actual nonevents

The statistics in Table 13 are computed from the classification table in Table 12.

Table 13: Statistics from the Classification Matrix with Cutpoint z

Statistic Equation Value
Sensitivity probability left-parenthesis upper D equals 1 vertical-bar upper Y equals 1 right-parenthesis n 11 slash n 1
1–specificity probability left-parenthesis upper D equals 1 vertical-bar upper Y equals 2 right-parenthesis n 21 slash n 2
Correct classification rate probability left-parenthesis upper Y equals 1 ampersand upper D equals 1 right-parenthesis plus probability left-parenthesis upper Y equals 2 ampersand upper D equals 2 right-parenthesis left-parenthesis n 11 plus n 22 right-parenthesis slash n
Misclassification rate probability left-parenthesis upper Y equals 1 ampersand upper D equals 2 right-parenthesis plus probability left-parenthesis upper Y equals 2 ampersand upper D equals 1 right-parenthesis left-parenthesis n 12 plus n 21 right-parenthesis slash n
Positive predictive value probability left-parenthesis upper Y equals 1 vertical-bar upper D equals 1 right-parenthesis n 11 slash left-parenthesis n 11 plus n 21 right-parenthesis
Negative predictive value probability left-parenthesis upper Y equals 2 vertical-bar upper D equals 2 right-parenthesis n 22 slash left-parenthesis n 12 plus n 22 right-parenthesis


The accuracy of the classification is measured by its ability to predict events and nonevents correctly. Sensitivity (true positive fraction, TPF, or recall) is the proportion of event responses that are predicted to be events. Specificity (true negative fraction, 1–FPF) is the proportion of nonevent responses that are predicted to be nonevents.

You can also measure accuracy by how well the classification predicts the response. The positive predictive value (precision, PPV) is the proportion of observations classified as events that are correctly classified. The negative predictive value (NPV) is the proportion of observations classified as nonevents that are correctly classified. The correct classification rate (accuracy, PC) is the proportion of observations that are correctly classified, and the misclassification rate (error rate) is the proportion of observations that are incorrectly classified. Given prior probabilities (prevalence, probability left-parenthesis upper Y equals 1 right-parenthesis) that are specified by the PEVENT= option, you can compute these conditional probabilities as posterior probabilities by using Bayes’ theorem, as shown in the following section.

Note: Current literature defines the false positive fraction as FPFequals n 21 slash n 2 and the false negative fraction as FNFequals n 12 slash n 1, and 1–PPV is called the false discovery rate and 1–NPV is called the false omission rate.

Positive Predictive Values, Negative Predictive Values, and Correct Classification Rates Using Bayes’ Theorem

If the prevalence of the disease in the population probability left-parenthesis upper Y equals 1 right-parenthesis is provided by the value of the PEVENT= option, then PROC LOGISTIC uses Bayes’ theorem to modify the PPV, NPV, and PC as follows (Fleiss, Levin, and Paik 2003):

StartLayout 1st Row 1st Column normal upper P normal upper P normal upper V 2nd Column equals probability left-parenthesis upper Y equals 1 vertical-bar upper D equals 1 right-parenthesis equals StartFraction probability left-parenthesis upper D equals 1 vertical-bar upper Y equals 1 right-parenthesis probability left-parenthesis upper Y equals 1 right-parenthesis Over probability left-parenthesis upper D equals 1 vertical-bar upper Y equals 2 right-parenthesis plus probability left-parenthesis upper Y equals 1 right-parenthesis left-bracket probability left-parenthesis upper D equals 1 vertical-bar upper Y equals 1 right-parenthesis minus probability left-parenthesis upper D equals 1 vertical-bar upper Y equals 2 right-parenthesis right-bracket EndFraction 2nd Row 1st Column normal upper N normal upper P normal upper V 2nd Column equals probability left-parenthesis upper Y equals 2 vertical-bar upper D equals 2 right-parenthesis equals StartFraction probability left-parenthesis upper D equals 2 vertical-bar upper Y equals 2 right-parenthesis left-parenthesis 1 minus probability left-parenthesis upper Y equals 1 right-parenthesis right-parenthesis Over probability left-parenthesis upper D equals 2 vertical-bar upper Y equals 2 right-parenthesis plus probability left-parenthesis upper Y equals 1 right-parenthesis left-bracket probability left-parenthesis upper D equals 2 vertical-bar upper Y equals 1 right-parenthesis minus probability left-parenthesis upper D equals 2 vertical-bar upper Y equals 2 right-parenthesis right-bracket EndFraction 3rd Row 1st Column normal upper P normal upper C 2nd Column equals probability left-parenthesis upper Y equals 1 ampersand upper D equals 1 right-parenthesis plus probability left-parenthesis upper Y equals 2 ampersand upper D equals 2 right-parenthesis 4th Row 1st Column Blank 2nd Column equals probability left-parenthesis upper D equals 1 vertical-bar upper Y equals 1 right-parenthesis probability left-parenthesis upper Y equals 1 right-parenthesis plus probability left-parenthesis upper D equals 2 vertical-bar upper Y equals 2 right-parenthesis left-bracket 1 minus probability left-parenthesis upper Y equals 1 right-parenthesis right-bracket EndLayout

If you do not specify the PEVENT= option, then PROC LOGISTIC uses the sample proportion of diseased individuals; that is, probability left-parenthesis upper Y equals 1 right-parenthesis equals n 1 slash n. In such a case, the preceding values reduce to those in Table 13. Note that for a stratified sampling or case-control situation in which n 1 and n 2 are chosen a priori, n 1 slash n is not a desirable estimate of probability left-parenthesis upper Y equals 1 right-parenthesis, so you should specify the PEVENT= option.

Predicted Probability of an Event for Classification

When you classify a set of binary data, if the same observations used to fit the model are also used to estimate the classification error, the resulting error-count estimate is biased. One way of reducing the bias is to remove the binary observation to be classified from the data, reestimate the parameters of the model, and then classify the observation based on the new parameter estimates. However, it would be costly to fit the model by leaving out each observation one at a time. The LOGISTIC procedure provides a less expensive one-step approximation to the preceding parameter estimates. Let ModifyingAbove bold-italic beta With caret be the MLE of the parameter vector left-parenthesis alpha comma beta 1 comma ellipsis comma beta Subscript s Baseline right-parenthesis prime based on all observations. Let ModifyingAbove bold-italic beta With caret Subscript left-parenthesis j right-parenthesis denote the MLE computed without the jth observation. The one-step estimate of ModifyingAbove bold-italic beta With caret Subscript left-parenthesis j right-parenthesis is given by

ModifyingAbove bold-italic beta With caret Subscript left-parenthesis j right-parenthesis Superscript 1 Baseline equals ModifyingAbove bold-italic beta With caret minus StartFraction w Subscript j Baseline left-parenthesis y Subscript j Baseline minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis Over 1 minus h Subscript j Baseline EndFraction ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis StartBinomialOrMatrix 1 Choose bold x Subscript j EndBinomialOrMatrix

where

y Subscript j

is 1 for an observed event response and 0 otherwise

w Subscript j

is the weight of the observation

ModifyingAbove pi With caret Subscript j

is the predicted event probability based on ModifyingAbove bold-italic beta With caret

h Subscript j

is the hat diagonal element with n Subscript j Baseline equals 1 and r Subscript j Baseline equals y Subscript j

ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis

is the estimated covariance matrix of ModifyingAbove bold-italic beta With caret

Last updated: December 09, 2022