The PHREG Procedure

Concordance Statistics

The predictive accuracy of a statistical model can be measured by the agreement between observed and predicted outcomes. In the context of logistic regression with binary outcomes, the concordance statistic (also known as C-statistic) is the most commonly used measure of accuracy. The concept underlying concordance is that a subject who experiences a particular outcome has a higher predicted probability of that outcome than a subject who does not experience the outcome.

The C-statistic can be calculated as the proportion of pairs of subjects whose observed and predicted outcomes agree (are concordant) among all possible pairs in which one subject experiences the outcome of interest and the other one does not. The higher the C-statistic, the better the model can discriminate between subjects who do experience the outcome of interest and subjects who do not.

C-statistics can be formulated for any modeling approach that generates predicted values. In the context of survival analysis, various C-statistics have been formulated to deal with right-censored data. PROC PHREG provides concordance statistics that were introduced by Harrell (1986) and Uno et al. (2011). The following subsections discuss these statistics. In these subsections, bold-italic beta denotes the true regression parameters, and for a pair of subjects whose covariate vectors are bold upper Z 1 and bold upper Z 2 the survival times are denoted as upper T 1 and upper T 2 and the censoring times are denoted as upper D 1 and upper D 2, respectively. For the ith individual (1 less-than-or-equal-to i less-than-or-equal-to n) in a sample, let upper X Subscript i Baseline comma normal upper Delta Subscript i Baseline comma and bold upper Z Subscript i be the observed time, event indicator (1 for death and 0 for censored), and covariate vector, respectively. Let ModifyingAbove bold-italic beta With caret denote the maximum partial likelihood estimates of bold-italic beta.

Harrell’s Concordance Statistic

Harrell (1986) proposes the following definition of the concordance probability:

upper C Subscript upper H Baseline equals probability left-parenthesis bold-italic beta prime bold upper Z 1 greater-than bold-italic beta prime bold upper Z 2 vertical-bar upper T 1 less-than upper T 2 comma upper T 1 less-than min left-parenthesis upper D 1 comma upper D 2 right-parenthesis right-parenthesis

Assuming no ties in the event times and the predictor scores, upper C Subscript upper H can be estimated by

ModifyingAbove upper C With caret Subscript upper H Baseline equals StartFraction sigma-summation Underscript i not-equals j Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline right-parenthesis upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis Over sigma-summation Underscript i not-equals j Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline right-parenthesis EndFraction

When there are ties in the predictor scores, the preceding calculation can be adjusted to be

ModifyingAbove upper C With caret Subscript upper H Baseline equals StartFraction sigma-summation Underscript i not-equals j Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline right-parenthesis left-bracket upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis plus 0.5 upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline equals ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis right-bracket Over sigma-summation Underscript i not-equals j Endscripts normal upper Delta Subscript i Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline right-parenthesis EndFraction

Assuming that the censoring time is independent of the event time, Kang et al. (2015) derive the standard errors estimator by using the delta method. Note this derivation assumes that ModifyingAbove bold-italic beta With caret is fixed, so it does not account for the variability in estimating bold-italic beta. In order to show this condition more explicitly, the linear predictor bold-italic beta prime bold upper Z is replaced by a single variable Y. For a pair of subjects i and j, define the following quantities:

s g n left-parenthesis upper Y Subscript i Baseline comma upper Y Subscript j Baseline right-parenthesis equals upper I left-parenthesis upper Y Subscript i Baseline greater-than-or-equal-to upper Y Subscript j Baseline right-parenthesis minus upper I left-parenthesis upper Y Subscript i Baseline less-than-or-equal-to upper Y Subscript j Baseline right-parenthesis
c s g n left-parenthesis upper X Subscript i Baseline comma normal upper Delta Subscript i Baseline comma upper X Subscript j Baseline comma normal upper Delta Subscript j Baseline right-parenthesis equals upper I left-parenthesis upper X Subscript i Baseline greater-than-or-equal-to upper X Subscript j Baseline right-parenthesis normal upper Delta Subscript j Baseline minus upper I left-parenthesis upper X Subscript i Baseline less-than-or-equal-to upper X Subscript j Baseline right-parenthesis normal upper Delta Subscript i

Let t Subscript i j upper X upper Y Baseline equals c s g n left-parenthesis upper X Subscript i Baseline comma normal upper Delta Subscript i Baseline comma upper X Subscript j Baseline comma normal upper Delta Subscript j Baseline right-parenthesis s g n left-parenthesis upper Y Subscript i Baseline comma upper Y Subscript j Baseline right-parenthesis, t Subscript i j upper X upper X Superscript asterisk Baseline equals c s g n left-parenthesis upper X Subscript i Baseline comma normal upper Delta Subscript i Baseline comma upper X Subscript j Baseline comma normal upper Delta Subscript j Baseline right-parenthesis squared. Further define the following quantities:

t Subscript upper X upper Y Baseline equals StartFraction 1 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i not-equals j Endscripts t Subscript i j upper X upper Y
t Subscript upper X upper X Superscript asterisk Baseline equals StartFraction 1 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i not-equals j Endscripts t Subscript i j upper X upper X Superscript asterisk

Harrell’s estimator can be rewritten as

ModifyingAbove upper C With caret Subscript upper H Baseline equals one-half left-parenthesis StartFraction t Subscript upper X upper Y Baseline Over t Subscript upper X upper X Superscript asterisk Baseline EndFraction plus 1 right-parenthesis

Applying the delta method, the variance of Harrell’s C-statistic can be estimated by

ModifyingAbove v a r With caret left-parenthesis ModifyingAbove upper C With caret Subscript upper H Baseline right-parenthesis equals Start 1 By 2 Matrix 1st Row 1st Column StartFraction 1 Over t Subscript upper X upper X Superscript asterisk Baseline EndFraction 2nd Column minus StartFraction t Subscript upper X upper Y Baseline Over t Subscript upper X upper X Superscript asterisk 2 Baseline EndFraction EndMatrix Start 2 By 2 Matrix 1st Row 1st Column ModifyingAbove v a r With caret left-parenthesis t Subscript upper X upper Y Baseline right-parenthesis 2nd Column normal c normal o normal v Superscript caret Baseline left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline comma t Subscript upper X upper Y Baseline right-parenthesis 2nd Row 1st Column normal c normal o normal v Superscript caret Baseline left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline comma t Subscript upper X upper Y Baseline right-parenthesis 2nd Column ModifyingAbove v a r With caret left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline right-parenthesis EndMatrix Start 1 By 2 Matrix 1st Row 1st Column StartFraction 1 Over t Subscript upper X upper X Superscript asterisk Baseline EndFraction 2nd Column minus StartFraction t Subscript upper X upper Y Baseline Over t Subscript upper X upper X Superscript asterisk 2 Baseline EndFraction EndMatrix prime

where

ModifyingAbove v a r With caret left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline right-parenthesis equals StartStartFraction 4 sigma-summation Underscript i Endscripts left-parenthesis sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk Baseline right-parenthesis squared minus sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk 2 Baseline minus StartFraction 2 left-parenthesis 2 n minus 3 right-parenthesis Over n left-parenthesis n minus 1 right-parenthesis EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk 2 Baseline right-parenthesis squared OverOver n left-parenthesis n minus 1 right-parenthesis left-parenthesis n minus 2 right-parenthesis left-parenthesis n minus 3 right-parenthesis EndEndFraction
ModifyingAbove v a r With caret left-parenthesis t Subscript upper X upper Y Baseline right-parenthesis equals StartStartFraction 4 sigma-summation Underscript i Endscripts left-parenthesis sigma-summation Underscript j Endscripts t Subscript i j upper X upper Y Baseline right-parenthesis squared minus sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper Y Baseline minus StartFraction 2 left-parenthesis 2 n minus 3 right-parenthesis Over n left-parenthesis n minus 1 right-parenthesis EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper Y Baseline right-parenthesis squared OverOver n left-parenthesis n minus 1 right-parenthesis left-parenthesis n minus 2 right-parenthesis left-parenthesis n minus 3 right-parenthesis EndEndFraction
ModifyingAbove normal c normal o normal v With caret left-parenthesis t Subscript upper X upper X Superscript asterisk Baseline comma t Subscript upper X upper Y Baseline right-parenthesis equals StartStartFraction 4 sigma-summation Underscript i Endscripts left-parenthesis sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk Baseline sigma-summation Underscript j prime Endscripts t Subscript i j prime upper X upper Y Baseline right-parenthesis minus sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk Baseline t Subscript i j upper X upper Y Baseline minus StartFraction 2 left-parenthesis 2 n minus 3 right-parenthesis Over n left-parenthesis n minus 1 right-parenthesis EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper X Superscript asterisk 2 Baseline right-parenthesis left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts t Subscript i j upper X upper Y Baseline right-parenthesis OverOver n left-parenthesis n minus 1 right-parenthesis left-parenthesis n minus 2 right-parenthesis left-parenthesis n minus 3 right-parenthesis EndEndFraction

Uno’s Concordance Statistic

Uno et al. (2011) propose the following method for estimating the concordance probability:

upper C Subscript upper U Baseline equals probability left-parenthesis bold-italic beta prime bold upper Z 1 greater-than bold-italic beta prime bold upper Z 2 vertical-bar upper T 1 less-than upper T 2 right-parenthesis

If tau is a specified time point within the support of the censoring variable, Uno et al. (2011) also define a truncated version of the concordance probability as

upper C Subscript upper U Baseline equals probability left-parenthesis bold-italic beta prime bold upper Z 1 greater-than bold-italic beta prime bold upper Z 2 vertical-bar upper T 1 less-than upper T 2 comma upper T 1 less-than tau right-parenthesis

You can specify a tau value in the TAU= option in the PROC PHREG statement. If the TAU= option is not specified, there is no truncation and the tau value is taken as the largest event time.

For the ith individual (1 less-than-or-equal-to i less-than-or-equal-to n), let upper X Subscript i Baseline comma normal upper Delta Subscript i Baseline comma and bold upper Z Subscript i be the observed time, event indicator (1 for death and 0 for censored), and covariate vector, respectively. Let ModifyingAbove upper G With caret left-parenthesis t right-parenthesis be the Kaplan-Meier estimate of the censoring distribution (assuming no covariates). upper C Subscript upper U is consistently estimated by

ModifyingAbove upper C With caret Subscript upper U Baseline equals StartFraction sigma-summation Underscript i equals 1 Overscript n Endscripts sigma-summation Underscript j equals 1 Overscript n Endscripts normal upper Delta Subscript i Baseline ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis left-bracket upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis plus 0.5 asterisk upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline equals ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis right-bracket Over sigma-summation Underscript i equals 1 Overscript n Endscripts sigma-summation Underscript j equals 1 Overscript n Endscripts normal upper Delta Subscript i Baseline ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis EndFraction

Define upper W equals StartRoot n EndRoot left-parenthesis ModifyingAbove upper C With caret Subscript upper U Baseline minus upper C Subscript upper U Baseline right-parenthesis. It can be shown that W is asymptotically distributed as a normal random variable with mean zero. The variance of W can be approximated by using the perturbation-resampling method. Specifically, let StartSet psi Subscript i Baseline comma i equals 1 comma ellipsis comma n EndSet be a set of independent samples from an exponential distribution with mean of 1 and variance of 1. For a large n, W can be approximated by

upper W overTilde equals sigma-summation Underscript i less-than j Endscripts 0.5 asterisk left-bracket ModifyingAbove upper V With caret Subscript i j Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis plus ModifyingAbove upper V With caret Subscript j i Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket psi Subscript i Baseline psi Subscript j Baseline plus left-bracket ModifyingAbove upper K With caret left-parenthesis upper G Superscript asterisk Baseline right-parenthesis minus ModifyingAbove upper K With caret left-parenthesis ModifyingAbove upper G With caret right-parenthesis right-bracket plus left-bracket ModifyingAbove upper C With caret Subscript upper U Baseline left-parenthesis bold-italic beta Superscript asterisk Baseline right-parenthesis minus ModifyingAbove upper C With caret Subscript upper U Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket

where

ModifyingAbove upper K With caret left-parenthesis upper G right-parenthesis equals StartFraction sigma-summation Underscript i less-than j Endscripts upper G left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis normal upper Delta Subscript i Baseline left-bracket upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis plus 0.5 asterisk upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline equals ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis minus ModifyingAbove upper C With caret Subscript tau Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket Over sigma-summation Underscript i less-than j Endscripts ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis normal upper Delta Subscript i Baseline EndFraction comma
ModifyingAbove upper V With caret Subscript i j Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis equals StartFraction upper G left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis normal upper Delta Subscript i Baseline left-bracket upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline greater-than ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis plus 0.5 asterisk upper I left-parenthesis ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript i Baseline equals ModifyingAbove bold-italic beta With caret prime bold upper Z Subscript j Baseline right-parenthesis minus ModifyingAbove upper C With caret Subscript tau Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket Over sigma-summation Underscript i less-than j Endscripts ModifyingAbove upper G With caret left-parenthesis upper X Subscript i Superscript minus Baseline right-parenthesis Superscript negative 2 Baseline upper I left-parenthesis upper X Subscript i Baseline less-than upper X Subscript j Baseline comma upper X Subscript i Baseline less-than tau right-parenthesis normal upper Delta Subscript i Baseline EndFraction

and upper G Superscript asterisk Baseline left-parenthesis dot right-parenthesis and bold-italic beta Superscript asterisk are the perturbed versions of ModifyingAbove upper G With caret and ModifyingAbove bold-italic beta With caret. upper G Superscript asterisk Baseline left-parenthesis dot right-parenthesis is calculated as

upper G Superscript asterisk Baseline left-parenthesis t right-parenthesis equals ModifyingAbove upper G With caret left-parenthesis t right-parenthesis minus ModifyingAbove upper G With caret left-parenthesis t right-parenthesis StartFraction 2 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i less-than j Endscripts integral Subscript 0 Superscript t Baseline StartFraction 1 Over n Superscript negative 1 Baseline sigma-summation Underscript i Endscripts upper I left-parenthesis upper X Subscript i Baseline greater-than-or-equal-to u right-parenthesis EndFraction left-bracket d ModifyingAbove upper M With caret left-parenthesis u right-parenthesis plus d ModifyingAbove upper M With caret left-parenthesis u right-parenthesis right-bracket psi Subscript i Baseline psi Subscript j Baseline slash 2

where ModifyingAbove upper M With caret left-parenthesis t right-parenthesis equals upper I left-parenthesis upper X Subscript i Baseline less-than-or-equal-to u comma normal upper Delta Subscript i Baseline equals 0 right-parenthesis minus integral Subscript 0 Superscript t Baseline upper I left-parenthesis upper X Subscript i Baseline greater-than-or-equal-to u right-parenthesis d ModifyingAbove normal upper Lamda With caret Subscript upper C Baseline left-parenthesis u right-parenthesis and ModifyingAbove normal upper Lamda With caret Subscript upper C Baseline left-parenthesis dot right-parenthesis is a consistent estimator of the cumulative hazard function for the censoring time variable. bold-italic beta Superscript asterisk is calculated as

bold-italic beta Superscript asterisk Baseline equals ModifyingAbove bold-italic beta With caret plus StartFraction 2 Over n left-parenthesis n minus 1 right-parenthesis EndFraction sigma-summation Underscript i less-than j Endscripts StartSet ModifyingAbove upper H With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis left-bracket upper U Subscript i Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis plus upper U Subscript j Baseline left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis right-bracket slash 2 EndSet psi Subscript i Baseline psi Subscript j

where ModifyingAbove upper H With caret is the estimated variance-covariance matrix of ModifyingAbove bold-italic beta With caret divided by n and upper U Subscript i is the contribution to the partial likelihood function from the ith individual. The third term of the formula for upper W overTilde is dropped out if you use the PRED= option in an ROC statement to specify a variable that contains the prediction scores.

Suppose ModifyingAbove sigma With caret squared is the sample variance based on M realizations of upper W overTilde. The 100 left-parenthesis 1 minus alpha right-parenthesis% confidence limits for upper C Subscript upper U are ModifyingAbove upper C With caret Subscript upper U Baseline plus-or-minus z Subscript alpha slash 2 Baseline ModifyingAbove sigma With caret, where z Subscript alpha slash 2 is the upper 100 alpha slash 2 percentile of the standard normal distribution.

Last updated: March 08, 2022