(View the complete code for this example.)
Consider the hypothetical example in Fleiss (1981, pp. 6–7), in which a test is applied to a sample of 1,000 people known to have a disease and to another sample of 1,000 people known not to have the same disease. In the diseased sample, 950 test positive; in the nondiseased sample, only 10 test positive. If the true disease rate in the population is 1 in 100, specifying PEVENT=0.01 results in the correct positive and negative predictive values for the stratified sampling scheme. Omitting the PEVENT= option is equivalent to using the overall sample disease rate (1000/2000 = 0.5) as the value of the PEVENT= option, which would ignore the stratified sampling.
The statements to create the data set and perform the analysis are as follows:
data Screen;
do Disease='Present','Absent';
do Test=1,0;
input Count @@;
output;
end;
end;
datalines;
950 50
10 990
;
proc logistic data=Screen;
freq Count;
model Disease(event='Present')=Test
/ pevent=.5 .01 ctable pprob=.5;
run;
The response variable option EVENT= indicates that Disease=’Present’ is the event. The CTABLE option is specified to produce a classification table.
Specifying PPROB=0.5 indicates a cutoff probability of 0.5. A list of two probabilities, 0.5 and 0.01, is specified for the
PEVENT= option; 0.5 corresponds to the overall sample disease rate, and 0.01 corresponds to a true disease rate of 1 in 100.
The classification table is shown in Output 79.5.1.
Output 79.5.1: Positive and Negative Predictive Values
| Classification Table | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Prob Event |
Prob Level |
Correct | Incorrect | Percentages | ||||||
| Event | Non- Event |
Event | Non- Event |
Correct | Sensi- tivity |
Speci- ficity |
Pos Pred |
Neg Pred |
||
| 0.500 | 0.500 | 950 | 990 | 10 | 50 | 97.0 | 95.0 | 99.0 | 99.0 | 95.2 |
| 0.010 | 0.500 | 950 | 990 | 10 | 50 | 99.0 | 95.0 | 99.0 | 49.0 | 99.9 |
In the classification table, the column "Prob Level" represents the cutoff values (the settings of the PPROB= option) for predicting whether an observation is an event. The "Correct" columns list the numbers of subjects that are correctly predicted as events and nonevents, respectively, and the "Incorrect" columns list the number of nonevents incorrectly predicted as events and the number of events incorrectly predicted as nonevents, respectively. For PEVENT=0.5, the positive predictive value is 99% and the negative predictive value is 95.2%. These results ignore the fact that the samples were stratified and incorrectly assume that the overall sample proportion of disease (which is 0.5) estimates the true disease rate. For a true disease rate of 0.01, the positive and negative predictive values are 49% and 99.9%, respectively, as shown in the second line of the classification table.