The PHREG Procedure

Time-Dependent ROC Curves

In the context of logistic regression with binary outcomes, receiver operator characteristic (ROC) curves and AUC (area under the ROC curve) statistics are commonly used to assess the ability of the model to discriminate between the two outcomes. To adapt the concept of ROC curves to the survival setting, various definitions and estimators of time-dependent ROC curves and AUC functions have been proposed. See Blanche, Latouche, and Viallon (2013) for a comprehensive survey of different methods. Time-dependent ROC curves and AUC functions characterize how well the fitted model can distinguish between subjects who experience an event from subjects who are event-free.

Whereas C-statistics provide overall measures of predictive accuracy, time-dependent ROC curves and AUC functions summarize the predictive accuracy at specific times. In practice, it is common to use several time points within the support of the observed event times.

Let T denote the event-time variable, and let Y denote the continuous variable to be assessed. At time t, a binary outcome can be defined as follows:

upper D Subscript t Baseline equals upper I left-parenthesis upper T less-than-or-equal-to t right-parenthesis

Suppose c denotes a specific value within the support of Y. The sensitivity (SE) and specificity (SP) can be defined as

normal upper S normal upper E Subscript t Baseline left-parenthesis c right-parenthesis equals probability left-parenthesis upper Y greater-than c vertical-bar upper D Subscript t Baseline equals 1 right-parenthesis

normal upper S normal upper P Subscript t Baseline left-parenthesis c right-parenthesis equals probability left-parenthesis upper Y less-than-or-equal-to c vertical-bar upper D Subscript t Baseline equals 0 right-parenthesis

The ROC curve at time t is defined to be

normal upper R normal upper O normal upper C Subscript t Baseline left-parenthesis p right-parenthesis equals normal upper S normal upper E Subscript t Baseline left-parenthesis c Subscript p Baseline right-parenthesis comma c Subscript p Baseline equals normal upper S normal upper P Subscript t Superscript negative 1 Baseline left-parenthesis 1 minus p right-parenthesis

This definition is often referred to as the "cumulative/dynamic" ROC curve in the literature. "Cumulative" means all events that occurred before time t are considered as "cases." Other types of time-dependent ROC curves are available in the literature—for example, in Heagerty and Zheng (2005).

The AUC statistic at time t is the area under the ROC curve at time t:

normal upper A normal upper U normal upper C Subscript t Baseline equals integral normal upper R normal upper O normal upper C Subscript t Baseline left-parenthesis u right-parenthesis d u

Let denote the vector of regression parameters. For the ith individual (), let and be the observed time, event indicator (1 for death and 0 for censored), and covariate vector, respectively. Let denote the maximum partial likelihood estimates of . The estimated linear predictor for the ith individual is . PROC PHREG supports the approaches that are described in the following sections for estimating time-dependent ROC curves.

Inverse Probability of Censoring Weighting Approach

Let be the Kaplan-Meier estimate of the censoring distribution (assuming no covariates). Assuming that the censoring distribution is independent of the failure time distribution, the sensitivity and specificity under a specific threshold value c can be consistently estimated by

ModifyingAbove normal upper S normal upper E With caret Subscript t Baseline left-parenthesis c right-parenthesis equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than c comma upper X Subscript i Baseline less-than-or-equal-to t right-parenthesis slash ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript n Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis upper X Subscript i Baseline less-than-or-equal-to t right-parenthesis slash ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Baseline right-parenthesis EndFraction

ModifyingAbove normal upper S normal upper P With caret Subscript t Baseline left-parenthesis c right-parenthesis equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline less-than-or-equal-to c comma upper X Subscript i Baseline greater-than t right-parenthesis Over sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis upper X Subscript i Baseline greater-than t right-parenthesis EndFraction

can be estimated by substituting in these estimated sensitivities and specificities. The estimated is calculated by using the trapezoidal rule to integrate the estimated curve.

Uno et al. (2007) propose estimating the standard errors of the estimator by using the perturbation-resampling method. Let be a set of independent samples from an exponential distribution with mean of 1 and variance of 1. The perturbed versions of and are

ModifyingAbove normal upper S normal upper E With caret Subscript t Superscript asterisk Baseline left-parenthesis c right-parenthesis equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis bold-italic beta Superscript asterisk Super Superscript prime Superscript Baseline bold upper Z Subscript i Baseline greater-than c comma upper X Subscript i Baseline less-than-or-equal-to t right-parenthesis psi Subscript i Baseline slash ModifyingAbove upper G With caret Superscript asterisk Baseline left-parenthesis upper X Subscript i Baseline right-parenthesis Over sigma-summation Underscript i equals 1 Overscript n Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis upper X Subscript i Baseline less-than-or-equal-to t right-parenthesis psi Subscript i Baseline slash ModifyingAbove upper G With caret Superscript asterisk Baseline left-parenthesis upper X Subscript i Baseline right-parenthesis EndFraction

ModifyingAbove normal upper S normal upper P With caret Subscript t Superscript asterisk Baseline left-parenthesis c right-parenthesis equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis bold-italic beta Superscript asterisk Super Superscript prime Superscript Baseline bold upper Z Subscript i Baseline less-than-or-equal-to c comma upper X Subscript i Baseline greater-than t right-parenthesis psi Subscript i Baseline Over sigma-summation Underscript i equals 1 Overscript n Endscripts upper I left-parenthesis upper X Subscript i Baseline greater-than t right-parenthesis psi Subscript i Baseline EndFraction

where and represent the perturbed versions of and . is calculated as

upper G Superscript asterisk Baseline left-parenthesis t right-parenthesis equals ModifyingAbove upper G With caret left-parenthesis t right-parenthesis minus ModifyingAbove upper G With caret left-parenthesis t right-parenthesis StartFraction 2 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i less-than j Endscripts integral Subscript 0 Superscript t Baseline StartFraction 1 Over n Superscript negative 1 Baseline sigma-summation Underscript i Endscripts upper I left-parenthesis upper X Subscript i Baseline greater-than-or-equal-to u right-parenthesis EndFraction left-bracket d ModifyingAbove upper M With caret left-parenthesis u right-parenthesis plus d ModifyingAbove upper M With caret left-parenthesis u right-parenthesis right-bracket psi Subscript i Baseline psi Subscript j Baseline slash 2

where and is a consistent estimator of the cumulative hazard function for the censoring time variable. is calculated as

bold-italic beta Superscript asterisk Baseline equals ModifyingAbove bold-italic beta With caret plus StartFraction 2 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i less-than j Endscripts StartSet ModifyingAbove upper H With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis left-bracket upper U Subscript i Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis plus upper U Subscript j Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket slash 2 EndSet psi Subscript i Baseline psi Subscript j

where is the estimated variance-covariance matrix of divided by n and is the partial likelihood contribution from the ith individual.

The perturbed estimate is obtained by substituting in the perturbed sensitivities and specificities. Suppose is the sample variance based on M realizations of the perturbed . The % confidence limits for are , where is the estimated and is the upper percentile of the standard normal distribution.

To choose this method of computing the time-dependent ROC curve, specify METHOD=IPCW in the ROCOPTIONS option in the PROC PHREG statement.

Note: This perturbation approach of estimating the standard error of the statistic does not apply to a model that is specified by the PRED= option or SOURCE= option in an ROC statement.

Conditional Kaplan-Meier Approach

By using Bayes’ theorem, sensitivity and the specificity can be written as

where is the survivor function and is the conditional survivor function for .

Heagerty, Lumley, and Pepe (2000) use the Kaplan-Meier method to estimate the survivor function and the conditional survivor function . The latter was estimated using subjects where the condition is met. The sensitivity and the specificity are estimated by

ModifyingAbove normal upper S normal upper E With caret Subscript t Baseline left-parenthesis c right-parenthesis equals StartFraction left-bracket 1 minus ModifyingAbove upper S With caret Subscript upper K upper M Baseline left-parenthesis t vertical-bar upper Y greater-than c right-parenthesis right-bracket left-bracket 1 minus ModifyingAbove upper F With caret Subscript upper Y Baseline left-parenthesis c right-parenthesis right-bracket Over 1 minus ModifyingAbove upper S With caret Subscript upper K upper M Baseline left-parenthesis t right-parenthesis EndFraction

ModifyingAbove normal upper S normal upper P With caret Subscript t Baseline left-parenthesis c right-parenthesis equals StartFraction ModifyingAbove upper S With caret Subscript upper K upper M Baseline left-parenthesis t vertical-bar upper Y less-than-or-equal-to c right-parenthesis ModifyingAbove upper F With caret Subscript upper Y Baseline left-parenthesis c right-parenthesis Over ModifyingAbove upper S With caret Subscript upper K upper M Baseline left-parenthesis t right-parenthesis EndFraction

where is the Kaplan-Meier estimator and .

To choose this method of computing ROC curves, specify METHOD=KM in the ROCOPTIONS in the PROC PHREG statement.

Nearest Neighbors Approach

Following Akritas (1994), the bivariate survival function, , can be estimated by

ModifyingAbove upper S With caret Subscript b Sub Subscript n Baseline left-parenthesis c comma t right-parenthesis equals StartFraction 1 Over n EndFraction sigma-summation Underscript i Endscripts ModifyingAbove upper S With caret Subscript b Sub Subscript n Baseline left-parenthesis t vertical-bar upper Y equals upper Y Subscript i Baseline right-parenthesis upper I left-parenthesis upper Y Subscript i Baseline greater-than c right-parenthesis

where is a smoothed estimate of the conditional survival function. Define the weighted Kaplan-Meier estimator as

ModifyingAbove upper S With caret Subscript b Sub Subscript n Baseline left-parenthesis t vertical-bar upper Y equals upper Y Subscript i Baseline right-parenthesis equals product Underscript s element-of left-brace upper X Subscript i Baseline colon i equals 1 comma ellipsis comma n comma normal upper Delta Subscript i Baseline equals 1 right-brace comma s less-than-or-equal-to t Endscripts left-bracket 1 minus StartFraction sigma-summation Underscript j Endscripts upper K Subscript b Sub Subscript n Subscript Baseline left-parenthesis upper Y Subscript i Baseline comma upper Y Subscript j Baseline right-parenthesis upper I left-parenthesis upper X Subscript j Baseline equals s right-parenthesis normal upper Delta Subscript j Baseline Over sigma-summation Underscript j Endscripts upper K Subscript b Sub Subscript n Subscript Baseline left-parenthesis upper Y Subscript i Baseline comma upper Y Subscript j Baseline right-parenthesis upper I left-parenthesis upper X Subscript j Baseline greater-than-or-equal-to s right-parenthesis EndFraction right-bracket

where is a kernel function that depends on the parameter . Akritas (1994) uses the nearest neighbor kernel, , where ; this effectively selects the nearest proportion of observations in the neighborhood. The default value for is 0.05. You can specify a different value by using the SPAN= suboption in METHOD=NNE in the ROCOPTIONS option in the PHREG statement.

The sensitivity and specificity can then be estimated as

ModifyingAbove normal upper S normal upper E With caret Subscript t Baseline left-parenthesis c right-parenthesis equals StartFraction 1 minus ModifyingAbove upper F With caret Subscript upper Y Baseline left-parenthesis c right-parenthesis minus ModifyingAbove upper S With caret Subscript b Sub Subscript n Subscript Baseline left-parenthesis c comma t right-parenthesis Over 1 minus ModifyingAbove upper S With caret Subscript b Sub Subscript n Subscript Baseline left-parenthesis t right-parenthesis EndFraction

ModifyingAbove normal upper S normal upper P With caret Subscript t Baseline left-parenthesis c right-parenthesis equals 1 minus StartFraction ModifyingAbove upper S With caret Subscript b Sub Subscript n Subscript Baseline left-parenthesis c comma t right-parenthesis Over ModifyingAbove upper S With caret Subscript b Sub Subscript n Subscript Baseline left-parenthesis t right-parenthesis EndFraction

where . For more information, see Heagerty, Lumley, and Pepe (2000).

To choose this method of computing time-dependent ROC curves, specify METHOD=NNE in the ROCOPTIONS option in the PROC PHREG statement.

Recursive Approach

Chambless and Diao (2006) propose estimating time-dependent ROC curves by using a recursive approach akin to the Kaplan-Meier method. Let be the distinct event times in the data. The area under the curve at time , can be derived as

normal upper A normal upper U normal upper C Subscript t Sub Subscript m Baseline equals StartFraction sigma-summation Underscript k equals 1 Overscript m Endscripts gamma Subscript k Baseline lamda left-parenthesis t Subscript k Baseline right-parenthesis left-parenthesis 1 minus lamda left-parenthesis t Subscript k Baseline right-parenthesis right-parenthesis upper S left-parenthesis t Subscript k minus 1 Baseline right-parenthesis minus sigma-summation Underscript k equals 1 Overscript m Endscripts tau Subscript k Baseline lamda left-parenthesis t Subscript k Baseline right-parenthesis left-parenthesis 1 minus upper S left-parenthesis t Subscript k minus 1 Baseline right-parenthesis right-parenthesis upper S left-parenthesis t Subscript k minus 1 Baseline right-parenthesis Over upper S left-parenthesis t Subscript m Baseline right-parenthesis left-parenthesis 1 minus upper S left-parenthesis t Subscript m Baseline right-parenthesis right-parenthesis EndFraction

where is the survivor function, is the hazard function, , , and

tau Subscript k Baseline equals probability left-parenthesis bold-italic beta prime bold upper Z Subscript i Baseline greater-than bold-italic beta prime bold upper Z Subscript j Baseline vertical-bar upper X Subscript i Baseline equals t Subscript k Baseline comma normal upper Delta Subscript i Baseline equals 1 comma upper X Subscript j Baseline greater-than t Subscript k Baseline right-parenthesis

gamma Subscript k Baseline equals probability left-parenthesis bold-italic beta prime bold upper Z Subscript i Baseline greater-than bold-italic beta prime bold upper Z Subscript j Baseline vertical-bar upper X Subscript i Baseline equals t Subscript k minus 1 Baseline comma normal upper Delta Subscript i Baseline equals 1 comma upper X Subscript j Baseline equals t Subscript k Baseline comma normal upper Delta Subscript j Baseline equals 1 right-parenthesis

In a recursive fashion, the sensitivity and specificity at time can be shown to be

normal upper S normal upper E Subscript t Sub Subscript m Baseline left-parenthesis c right-parenthesis equals sigma-summation Underscript k equals 1 Overscript m Endscripts rho Subscript k Baseline left-parenthesis c right-parenthesis lamda left-parenthesis t Subscript k Baseline right-parenthesis upper S left-parenthesis t Subscript k minus 1 Baseline right-parenthesis slash left-bracket 1 minus upper S left-parenthesis t Subscript m Baseline right-parenthesis right-bracket

normal upper S normal upper P Subscript t Sub Subscript m Baseline left-parenthesis c right-parenthesis equals StartFraction probability left-parenthesis bold-italic beta prime bold upper Z Subscript i Baseline less-than-or-equal-to c right-parenthesis minus sigma-summation Underscript k equals 1 Overscript m Endscripts left-bracket 1 minus rho Subscript k Baseline left-parenthesis c right-parenthesis right-bracket lamda left-parenthesis t Subscript k Baseline right-parenthesis upper S left-parenthesis t Subscript k minus 1 Baseline right-parenthesis Over upper S left-parenthesis t Subscript m Baseline right-parenthesis EndFraction

where .

Define to be the risk set at time , and let be the number of subjects in . Let be the covariate vector for the subject whose event time is . The unknown parameters , , and can be estimated by

ModifyingAbove tau With caret Subscript k Baseline equals StartFraction 1 Over k minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript k Endscripts upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime upper Z Subscript left-parenthesis i right-parenthesis Baseline greater-than ModifyingAbove bold-italic beta With caret prime upper Z Subscript left-parenthesis k right-parenthesis Baseline right-parenthesis

ModifyingAbove gamma With caret Subscript k Baseline equals StartFraction 1 Over r Subscript k Baseline minus 1 EndFraction sigma-summation Underscript j element-of script upper R Subscript k Baseline Endscripts upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime upper Z Subscript left-parenthesis k right-parenthesis Baseline greater-than ModifyingAbove bold-italic beta With caret prime upper Z Subscript j Baseline right-parenthesis

ModifyingAbove rho With caret Subscript k Baseline left-parenthesis c right-parenthesis equals upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime upper Z Subscript left-parenthesis k right-parenthesis Baseline greater-than c right-parenthesis

When there is only one event at each event time, is estimated by and is estimated by the Kaplan-Meier method as . In the case of a tie, the order of the events in the calculation is the same as the order of their appearance in the input data set.

To choose this method of computing time-dependent ROC curves, specify METHOD=RECURSIVE in the ROCOPTIONS option in the PROC PHREG statement.

Weighted Kernel Kaplan-Meier Approach

Let denote the conditional survival function of given . For the ith subject, define the weight to be the probability of being a nonsurvivor at time . Table 15 displays how can be computed.

Table 15: Probability of Being a Nonsurvivor for the ith Subject

	Status
	or	0
		1
		0

You can estimate by using a kernel-based Kaplan-Meier-type method:

where is a kernel function that depends on the parameter . PROC PHREG uses the nearest-neighbors kernel, , where ; this effectively selects the nearest proportion of observations in the neighborhood. The default value for is 0.05. You can specify a different value by using the SPAN= suboption of the METHOD=WKKM option, which you specify in the ROCOPTIONS option in the PHREG statement.

At time , you can estimate the sensitivity and specificity as

Last updated: March 08, 2022