The PHREG Procedure

Concordance Statistics

The predictive accuracy of a statistical model can be measured by the agreement between observed and predicted outcomes. In the context of logistic regression with binary outcomes, the concordance statistic (also known as C-statistic) is the most commonly used measure of accuracy. The concept underlying concordance is that a subject who experiences a particular outcome has a higher predicted probability of that outcome than a subject who does not experience the outcome.

The C-statistic can be calculated as the proportion of pairs of subjects whose observed and predicted outcomes agree (are concordant) among all possible pairs in which one subject experiences the outcome of interest and the other one does not. The higher the C-statistic, the better the model can discriminate between subjects who do experience the outcome of interest and subjects who do not.

C-statistics can be formulated for any modeling approach that generates predicted values. In the context of survival analysis, various C-statistics have been formulated to deal with right-censored data. PROC PHREG provides concordance statistics that were introduced by Harrell (1986) and Uno et al. (2011). The following subsections discuss these statistics. In these subsections, denotes the true regression parameters, and for a pair of subjects whose covariate vectors are and the survival times are denoted as and and the censoring times are denoted as and , respectively. For the ith individual () in a sample, let and be the observed time, event indicator (1 for death and 0 for censored), and covariate vector, respectively. Let denote the maximum partial likelihood estimates of .

Harrell’s Concordance Statistic

Harrell (1986) proposes the following definition of the concordance probability:

upper C Subscript upper H Baseline equals probability left-parenthesis bold-italic beta prime bold upper Z 1 greater-than bold-italic beta prime bold upper Z 2 vertical-bar upper T 1 less-than upper T 2 comma upper T 1 less-than min left-parenthesis upper D 1 comma upper D 2 right-parenthesis right-parenthesis

Assuming no ties in the event times and the predictor scores, can be estimated by

ModifyingAbove upper C With caret Subscript upper H Baseline equals StartFraction sigma-summation Underscript i not-equals j Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline right-parenthesis upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis Over sigma-summation Underscript i not-equals j Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline right-parenthesis EndFraction

When there are ties in the predictor scores, the preceding calculation can be adjusted to be

Assuming that the censoring time is independent of the event time, Kang et al. (2015) derive the standard errors estimator by using the delta method. Note this derivation assumes that is fixed, so it does not account for the variability in estimating . In order to show this condition more explicitly, the linear predictor is replaced by a single variable Y. For a pair of subjects i and j, define the following quantities:

s g n left-parenthesis upper Y Subscript i Baseline comma upper Y Subscript j Baseline right-parenthesis equals upper I left-parenthesis upper Y Subscript i Baseline greater-than-or-equal-to upper Y Subscript j Baseline right-parenthesis minus upper I left-parenthesis upper Y Subscript i Baseline less-than-or-equal-to upper Y Subscript j Baseline right-parenthesis

c s g n left-parenthesis upper X Subscript i Baseline comma normal upper Delta Subscript i Baseline comma upper X Subscript j Baseline comma normal upper Delta Subscript j Baseline right-parenthesis equals upper I left-parenthesis upper X Subscript i Baseline greater-than-or-equal-to upper X Subscript j Baseline right-parenthesis normal upper Delta Subscript j Baseline minus upper I left-parenthesis upper X Subscript i Baseline less-than-or-equal-to upper X Subscript j Baseline right-parenthesis normal upper Delta Subscript i

Let , . Further define the following quantities:

t Subscript upper X upper Y Baseline equals StartFraction 1 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i not-equals j Endscripts t Subscript i j upper X upper Y

t Subscript upper X upper X Superscript asterisk Baseline equals StartFraction 1 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i not-equals j Endscripts t Subscript i j upper X upper X Superscript asterisk

Harrell’s estimator can be rewritten as

ModifyingAbove upper C With caret Subscript upper H Baseline equals one-half left-parenthesis StartFraction t Subscript upper X upper Y Baseline Over t Subscript upper X upper X Superscript asterisk Baseline EndFraction plus 1 right-parenthesis

Applying the delta method, the variance of Harrell’s C-statistic can be estimated by

ModifyingAbove v a r With caret left-parenthesis ModifyingAbove upper C With caret Subscript upper H Baseline right-parenthesis equals Start 1 By 2 Matrix 1st Row 1st Column StartFraction 1 Over t Subscript upper X upper X Superscript asterisk Baseline EndFraction 2nd Column minus StartFraction t Subscript upper X upper Y Baseline Over t Subscript upper X upper X Superscript asterisk 2 Baseline EndFraction EndMatrix Start 2 By 2 Matrix 1st Row 1st Column ModifyingAbove v a r With caret left-parenthesis t Subscript upper X upper Y Baseline right-parenthesis 2nd Column normal c normal o normal v Superscript caret Baseline left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline comma t Subscript upper X upper Y Baseline right-parenthesis 2nd Row 1st Column normal c normal o normal v Superscript caret Baseline left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline comma t Subscript upper X upper Y Baseline right-parenthesis 2nd Column ModifyingAbove v a r With caret left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline right-parenthesis EndMatrix Start 1 By 2 Matrix 1st Row 1st Column StartFraction 1 Over t Subscript upper X upper X Superscript asterisk Baseline EndFraction 2nd Column minus StartFraction t Subscript upper X upper Y Baseline Over t Subscript upper X upper X Superscript asterisk 2 Baseline EndFraction EndMatrix prime

where

ModifyingAbove v a r With caret left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline right-parenthesis equals StartStartFraction 4 sigma-summation Underscript i Endscripts left-parenthesis sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk Baseline right-parenthesis squared minus sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk 2 Baseline minus StartFraction 2 left-parenthesis 2 n minus 3 right-parenthesis Over n left-parenthesis n minus 1 right-parenthesis EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk 2 Baseline right-parenthesis squared OverOver n left-parenthesis n minus 1 right-parenthesis left-parenthesis n minus 2 right-parenthesis left-parenthesis n minus 3 right-parenthesis EndEndFraction

ModifyingAbove v a r With caret left-parenthesis t Subscript upper X upper Y Baseline right-parenthesis equals StartStartFraction 4 sigma-summation Underscript i Endscripts left-parenthesis sigma-summation Underscript j Endscripts t Subscript i j upper X upper Y Baseline right-parenthesis squared minus sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper Y Baseline minus StartFraction 2 left-parenthesis 2 n minus 3 right-parenthesis Over n left-parenthesis n minus 1 right-parenthesis EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper Y Baseline right-parenthesis squared OverOver n left-parenthesis n minus 1 right-parenthesis left-parenthesis n minus 2 right-parenthesis left-parenthesis n minus 3 right-parenthesis EndEndFraction

ModifyingAbove normal c normal o normal v With caret left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline comma t Subscript upper X upper Y Baseline right-parenthesis equals StartStartFraction 4 sigma-summation Underscript i Endscripts left-parenthesis sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk Baseline sigma-summation Underscript j prime Endscripts t Subscript i j prime upper X upper Y Baseline right-parenthesis minus sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk Baseline t Subscript i j upper X upper Y Baseline minus StartFraction 2 left-parenthesis 2 n minus 3 right-parenthesis Over n left-parenthesis n minus 1 right-parenthesis EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk 2 Baseline right-parenthesis left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper Y Baseline right-parenthesis OverOver n left-parenthesis n minus 1 right-parenthesis left-parenthesis n minus 2 right-parenthesis left-parenthesis n minus 3 right-parenthesis EndEndFraction

Uno’s Concordance Statistic

Uno et al. (2011) propose the following method for estimating the concordance probability:

upper C Subscript upper U Baseline equals probability left-parenthesis bold-italic beta prime bold upper Z 1 greater-than bold-italic beta prime bold upper Z 2 vertical-bar upper T 1 less-than upper T 2 right-parenthesis

If is a specified time point within the support of the censoring variable, Uno et al. (2011) also define a truncated version of the concordance probability as

You can specify a value in the TAU= option in the PROC PHREG statement. If the TAU= option is not specified, there is no truncation and the value is taken as the largest event time.

For the ith individual (), let and be the observed time, event indicator (1 for death and 0 for censored), and covariate vector, respectively. Let be the Kaplan-Meier estimate of the censoring distribution (assuming no covariates). is consistently estimated by

ModifyingAbove upper C With caret Subscript upper U Baseline equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts sigma-summation Underscript j equals 1 Overscript n Endscripts normal upper Delta Subscript i Baseline ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis left-bracket upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis plus 0.5 asterisk upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline equals ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis right-bracket Over sigma-summation Underscript i equals 1 Overscript n Endscripts sigma-summation Underscript j equals 1 Overscript n Endscripts normal upper Delta Subscript i Baseline ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis EndFraction

Define . It can be shown that W is asymptotically distributed as a normal random variable with mean zero. The variance of W can be approximated by using the perturbation-resampling method. Specifically, let be a set of independent samples from an exponential distribution with mean of 1 and variance of 1. For a large n, W can be approximated by

upper W overTilde equals sigma-summation Underscript i less-than j Endscripts 0.5 asterisk left-bracket ModifyingAbove upper V With caret Subscript i j Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis plus ModifyingAbove upper V With caret Subscript j i Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket psi Subscript i Baseline psi Subscript j Baseline plus left-bracket ModifyingAbove upper K With caret left-parenthesis upper G Superscript asterisk Baseline right-parenthesis minus ModifyingAbove upper K With caret left-parenthesis ModifyingAbove upper G With caret right-parenthesis right-bracket plus left-bracket ModifyingAbove upper C With caret Subscript upper U Baseline left-parenthesis bold-italic beta Superscript asterisk Baseline right-parenthesis minus ModifyingAbove upper C With caret Subscript upper U Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket

where

ModifyingAbove upper K With caret left-parenthesis upper G right-parenthesis equals StartFraction sigma-summation Underscript i less-than j Endscripts upper G left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis normal upper Delta Subscript i Baseline left-bracket upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis plus 0.5 asterisk upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline equals ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis minus ModifyingAbove upper C With caret Subscript tau Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket Over sigma-summation Underscript i less-than j Endscripts ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis normal upper Delta Subscript i Baseline EndFraction comma

ModifyingAbove upper V With caret Subscript i j Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis equals StartFraction upper G left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis normal upper Delta Subscript i Baseline left-bracket upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis plus 0.5 asterisk upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline equals ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis minus ModifyingAbove upper C With caret Subscript tau Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket Over sigma-summation Underscript i less-than j Endscripts ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis normal upper Delta Subscript i Baseline EndFraction

and and are the perturbed versions of and . is calculated as

upper G Superscript asterisk Baseline left-parenthesis t right-parenthesis equals ModifyingAbove upper G With caret left-parenthesis t right-parenthesis minus ModifyingAbove upper G With caret left-parenthesis t right-parenthesis StartFraction 2 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i less-than j Endscripts integral Subscript 0 Superscript t Baseline StartFraction 1 Over n Superscript negative 1 Baseline sigma-summation Underscript i Endscripts upper I left-parenthesis upper X Subscript i Baseline greater-than-or-equal-to u right-parenthesis EndFraction left-bracket d ModifyingAbove upper M With caret left-parenthesis u right-parenthesis plus d ModifyingAbove upper M With caret left-parenthesis u right-parenthesis right-bracket psi Subscript i Baseline psi Subscript j Baseline slash 2

where and is a consistent estimator of the cumulative hazard function for the censoring time variable. is calculated as

bold-italic beta Superscript asterisk Baseline equals ModifyingAbove bold-italic beta With caret plus StartFraction 2 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i less-than j Endscripts StartSet ModifyingAbove upper H With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis left-bracket upper U Subscript i Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis plus upper U Subscript j Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket slash 2 EndSet psi Subscript i Baseline psi Subscript j

where is the estimated variance-covariance matrix of divided by n and is the contribution to the partial likelihood function from the ith individual. The third term of the formula for is dropped out if you use the PRED= option in an ROC statement to specify a variable that contains the prediction scores.

Suppose is the sample variance based on M realizations of . The % confidence limits for are , where is the upper percentile of the standard normal distribution.

Last updated: March 08, 2022