In the context of logistic regression with binary outcomes, receiver operator characteristic (ROC) curves and AUC (area under the ROC curve) statistics are commonly used to assess the ability of the model to discriminate between the two outcomes. To adapt the concept of ROC curves to the survival setting, various definitions and estimators of time-dependent ROC curves and AUC functions have been proposed. See Blanche, Latouche, and Viallon (2013) for a comprehensive survey of different methods. Time-dependent ROC curves and AUC functions characterize how well the fitted model can distinguish between subjects who experience an event from subjects who are event-free.
Whereas C-statistics provide overall measures of predictive accuracy, time-dependent ROC curves and AUC functions summarize the predictive accuracy at specific times. In practice, it is common to use several time points within the support of the observed event times.
Let T denote the event-time variable, and let Y denote the continuous variable to be assessed. At time t, a binary outcome can be defined as follows:
Suppose c denotes a specific value within the support of Y. The sensitivity (SE) and specificity (SP) can be defined as
The ROC curve at time t is defined to be
This definition is often referred to as the "cumulative/dynamic" ROC curve in the literature. "Cumulative" means all events that occurred before time t are considered as "cases." Other types of time-dependent ROC curves are available in the literature—for example, in Heagerty and Zheng (2005).
The AUC statistic at time t is the area under the ROC curve at time t:
Let denote the vector of regression parameters. For the ith individual (
), let
and
be the observed time, event indicator (1 for death and 0 for censored), and covariate vector, respectively. Let
denote the maximum partial likelihood estimates of
. The estimated linear predictor for the ith individual is
. PROC PHREG supports the approaches that are described in the following sections for estimating time-dependent ROC curves.
Let be the Kaplan-Meier estimate of the censoring distribution (assuming no covariates). Assuming that the censoring distribution is independent of the failure time distribution, the sensitivity and specificity under a specific threshold value c can be consistently estimated by
can be estimated by substituting in these estimated sensitivities and specificities. The estimated
is calculated by using the trapezoidal rule to integrate the estimated
curve.
Uno et al. (2007) propose estimating the standard errors of the estimator by using the perturbation-resampling method. Let
be a set of independent samples from an exponential distribution with mean of 1 and variance of 1. The perturbed versions of
and
are
where and
represent the perturbed versions of
and
.
is calculated as
where and
is a consistent estimator of the cumulative hazard function for the censoring time variable.
is calculated as
where is the estimated variance-covariance matrix of
divided by n and
is the partial likelihood contribution from the ith individual.
The perturbed estimate is obtained by substituting in the perturbed sensitivities and specificities. Suppose
is the sample variance based on M realizations of the perturbed
. The
% confidence limits for
are
, where
is the estimated
and
is the upper
percentile of the standard normal distribution.
To choose this method of computing the time-dependent ROC curve, specify METHOD=IPCW in the ROCOPTIONS option in the PROC PHREG statement.
Note: This perturbation approach of estimating the standard error of the statistic does not apply to a model that is specified by the PRED= option or SOURCE= option in an ROC statement.
By using Bayes’ theorem, sensitivity and the specificity can be written as
where is the survivor function and
is the conditional survivor function for
.
Heagerty, Lumley, and Pepe (2000) use the Kaplan-Meier method to estimate the survivor function and the conditional survivor function
. The latter was estimated using subjects where the condition
is met. The sensitivity and the specificity are estimated by
where is the Kaplan-Meier estimator and
.
To choose this method of computing ROC curves, specify METHOD=KM in the ROCOPTIONS in the PROC PHREG statement.
Following Akritas (1994), the bivariate survival function, , can be estimated by
where is a smoothed estimate of the conditional survival function. Define the weighted Kaplan-Meier estimator as
where is a kernel function that depends on the parameter
. Akritas (1994) uses the nearest neighbor kernel,
, where
; this effectively selects the nearest
proportion of observations in the neighborhood. The default value for
is 0.05. You can specify a different value by using the SPAN= suboption in METHOD=NNE in the ROCOPTIONS option in the PHREG statement.
The sensitivity and specificity can then be estimated as
where . For more information, see Heagerty, Lumley, and Pepe (2000).
To choose this method of computing time-dependent ROC curves, specify METHOD=NNE in the ROCOPTIONS option in the PROC PHREG statement.
Chambless and Diao (2006) propose estimating time-dependent ROC curves by using a recursive approach akin to the Kaplan-Meier method. Let be the distinct event times in the data. The area under the curve at time
, can be derived as
where is the survivor function,
is the hazard function,
,
, and
In a recursive fashion, the sensitivity and specificity at time can be shown to be
Define to be the risk set at time
, and let
be the number of subjects in
. Let
be the covariate vector for the subject whose event time is
. The unknown parameters
,
, and
can be estimated by
When there is only one event at each event time, is estimated by
and
is estimated by the Kaplan-Meier method as
. In the case of a tie, the order of the events in the calculation is the same as the order of their appearance in the input data set.
To choose this method of computing time-dependent ROC curves, specify METHOD=RECURSIVE in the ROCOPTIONS option in the PROC PHREG statement.
Let denote the conditional survival function of
given
. For the ith subject, define the weight
to be the probability of being a nonsurvivor at time
. Table 15 displays how
can be computed.
You can estimate by using a kernel-based Kaplan-Meier-type method:
where is a kernel function that depends on the parameter
. PROC PHREG uses the nearest-neighbors kernel,
, where
; this effectively selects the nearest
proportion of observations in the neighborhood. The default value for
is 0.05. You can specify a different value by using the SPAN= suboption of the METHOD=WKKM option, which you specify in the ROCOPTIONS option in the PHREG statement.
At time , you can estimate the sensitivity and specificity as