The LOGISTIC Procedure

Precision-Recall Curves

Precision-recall (PR) curves are often used to evaluate predictive performance of models for skewed data where, for example, the number of nonevents is far greater than the number of events and for which ROC curves might be misleading (Saito and Rehmsmeier 2015).

PR curves are produced by mapping from the ROC curve, following Davis and Goadrich (2006). To perform this mapping, you use the sensitivity (TPF), the specificity (1 – FPF), and the prevalence of events (P) in order to compute the precision (PPV), given by

PPV equals StartFraction upper P dot TPF Over upper P dot left parenthesis 1 minus FPF right parenthesis plus left parenthesis 1 minus upper P right parenthesis left parenthesis 1 minus TPF right parenthesis EndFraction

See the section Receiver Operating Characteristic Curves for definitions. You cannot linearly interpolate between values on the PR curve because of this nonlinear PPV computation; instead, you linearly interpolate between points on the ROC curve and map these new points back to the PR curve. The STEPLEN= value specifies the length of a step along the ROC curve, which is used to determine the extra points at which the PR curve is computed. That is, if two neighboring points on the ROC curve are farther apart than value, then intermediate points at multiples of value from the first point are also mapped. You can specify a value between 0 and 1; the default is 0.01.

The PR curve often needs to be extrapolated back to TPF==0; this is performed by using either the first nontrivial PPV computed from the observed ROC points or the prevalence, whichever is larger.

The trapezoidal area under the PR curve is also computed and displayed. This area is often used to compare models, even though it is not statistically relevant in the same way that the AUC for the ROC curve is, as discussed in the section ROC Computations.

Last updated: December 09, 2022