The predictive accuracy of a statistical model can be measured by the agreement between observed and predicted outcomes. In the context of logistic regression with binary outcomes, the concordance statistic (also known as C-statistic) is the most commonly used measure of accuracy. The concept underlying concordance is that a subject who experiences a particular outcome has a higher predicted probability of that outcome than a subject who does not experience the outcome.
The C-statistic can be calculated as the proportion of pairs of subjects whose observed and predicted outcomes agree (are concordant) among all possible pairs in which one subject experiences the outcome of interest and the other one does not. The higher the C-statistic, the better the model can discriminate between subjects who do experience the outcome of interest and subjects who do not.
C-statistics can be formulated for any modeling approach that generates predicted values. In the context of survival analysis, various C-statistics have been formulated to deal with right-censored data. PROC PHREG provides concordance statistics that were introduced by Harrell (1986) and Uno et al. (2011). The following subsections discuss these statistics. In these subsections, denotes the true regression parameters, and for a pair of subjects whose covariate vectors are
and
the survival times are denoted as
and
and the censoring times are denoted as
and
, respectively. For the ith individual (
) in a sample, let
and
be the observed time, event indicator (1 for death and 0 for censored), and covariate vector, respectively. Let
denote the maximum partial likelihood estimates of
.
Harrell (1986) proposes the following definition of the concordance probability:
Assuming no ties in the event times and the predictor scores, can be estimated by
When there are ties in the predictor scores, the preceding calculation can be adjusted to be
Assuming that the censoring time is independent of the event time, Kang et al. (2015) derive the standard errors estimator by using the delta method. Note this derivation assumes that is fixed, so it does not account for the variability in estimating
. In order to show this condition more explicitly, the linear predictor
is replaced by a single variable Y. For a pair of subjects i and j, define the following quantities:
Let ,
. Further define the following quantities:
Harrell’s estimator can be rewritten as
Applying the delta method, the variance of Harrell’s C-statistic can be estimated by
where
Uno et al. (2011) propose the following method for estimating the concordance probability:
If is a specified time point within the support of the censoring variable, Uno et al. (2011) also define a truncated version of the concordance probability as
You can specify a value in the TAU= option in the PROC PHREG statement. If the TAU= option is not specified, there is no truncation and the
value is taken as the largest event time.
For the ith individual (), let
and
be the observed time, event indicator (1 for death and 0 for censored), and covariate vector, respectively. Let
be the Kaplan-Meier estimate of the censoring distribution (assuming no covariates).
is consistently estimated by
Define . It can be shown that W is asymptotically distributed as a normal random variable with mean zero. The variance of W can be approximated by using the perturbation-resampling method. Specifically, let
be a set of independent samples from an exponential distribution with mean of 1 and variance of 1. For a large n, W can be approximated by
where
and and
are the perturbed versions of
and
.
is calculated as
where and
is a consistent estimator of the cumulative hazard function for the censoring time variable.
is calculated as
where is the estimated variance-covariance matrix of
divided by n and
is the contribution to the partial likelihood function from the ith individual. The third term of the formula for
is dropped out if you use the PRED= option in an ROC statement to specify a variable that contains the prediction scores.
Suppose is the sample variance based on M realizations of
. The
% confidence limits for
are
, where
is the upper
percentile of the standard normal distribution.