For binary response data, the response is either an event or a nonevent. In PROC LOGISTIC, the response with Ordered Value 1 is regarded as the event, and the response with Ordered Value 2 is the nonevent. PROC LOGISTIC models the probability of the event. From the fitted model, a predicted event probability can be computed for each observation. A method to compute a reduced-bias estimate of the predicted probability is given in the section Predicted Probability of an Event for Classification. If the (reduced-bias) predicted event probability exceeds or equals some cutpoint value , the observation is predicted to be an event observation; otherwise, it is predicted to be a nonevent observation.
Suppose that of n individuals experience an event, such as a disease, and the remaining
individuals do not experience that event (that is, they have a nonevent response). The
frequency (classification, confusion, decision, error) table in Table 12 is obtained by cross-classifying the observed and predicted responses, where
is the total number of observations that are observed to have Y = i and are classified into j. In this table, let Y = 1 denote an observed event and Y = 2 denote a nonevent, and let the decision rule, D, classify an observation as an event when
; D = 1 indicates that the observation is classified as an event, and D = 2 indicates that the observation is classified as a nonevent.
The CTABLE option produces this table, and the PPROB= option selects one or more cutpoints z. Each cutpoint generates a classification table. If the PEVENT= option is also specified, a classification table is produced for each combination of PEVENT= and PPROB= values.
The cells of the classification matrix in Table 12 have the following interpretations:
The statistics in Table 13 are computed from the classification table in Table 12.
Table 13: Statistics from the Classification Matrix with Cutpoint z
The accuracy of the classification is measured by its ability to predict events and nonevents correctly. Sensitivity (true positive fraction, TPF, or recall) is the proportion of event responses that are predicted to be events. Specificity (true negative fraction, 1–FPF) is the proportion of nonevent responses that are predicted to be nonevents.
You can also measure accuracy by how well the classification predicts the response. The positive predictive value (precision, PPV) is the proportion of observations classified as events that are correctly classified. The negative predictive value (NPV) is the proportion of observations classified as nonevents that are correctly classified. The correct classification rate (accuracy, PC) is the proportion of observations that are correctly classified, and the misclassification rate (error rate) is the proportion of observations that are incorrectly classified. Given prior probabilities (prevalence, ) that are specified by the PEVENT= option, you can compute these conditional probabilities as posterior probabilities by using Bayes’ theorem, as shown in the following section.
Note: Current literature defines the false positive fraction as FPF and the false negative fraction as FNF
, and 1–PPV is called the false discovery rate and 1–NPV is called the false omission rate.
If the prevalence of the disease in the population is provided by the value of the PEVENT= option, then PROC LOGISTIC uses Bayes’ theorem to modify the PPV, NPV, and PC as follows (Fleiss, Levin, and Paik 2003):
If you do not specify the PEVENT= option, then PROC LOGISTIC uses the sample proportion of diseased individuals; that is, . In such a case, the preceding values reduce to those in Table 13. Note that for a stratified sampling or case-control situation in which
and
are chosen a priori,
is not a desirable estimate of
, so you should specify the PEVENT= option.
When you classify a set of binary data, if the same observations used to fit the model are also used to estimate the classification error, the resulting error-count estimate is biased. One way of reducing the bias is to remove the binary observation to be classified from the data, reestimate the parameters of the model, and then classify the observation based on the new parameter estimates. However, it would be costly to fit the model by leaving out each observation one at a time. The LOGISTIC procedure provides a less expensive one-step approximation to the preceding parameter estimates. Let be the MLE of the parameter vector
based on all observations. Let
denote the MLE computed without the jth observation. The one-step estimate of
is given by
where
is 1 for an observed event response and 0 otherwise
is the weight of the observation
is the hat diagonal element with and