Competing risks arise in the analysis of time-to-event data when the event of interest can be impeded by a prior event of a different type. For example, a leukemia patient’s relapse might be unobservable because the patient dies before relapse is diagnosed. In the presence of competing risks, the Kaplan-Meier method of estimating the survivor function is biased, because you can no longer assume that a subject will experience the event of interest if the follow-up period is long enough. An increasingly common practice of assessing the probability of a failure in competing-risks analysis is to estimate the cumulative incidence function (CIF), which is the probability subdistribution function of failure from a specific cause.
PROC PHREG provides two commonly used approaches of evaluating the relationship of the covariates to the cause-specific outcome in competing-risks data. One approach models the cause-specific hazard for each cause of failure separately, by applying the Cox regression to target each cause of failure and censor all other causes. This approach can be considered to be a direct generalization of the Cox model to the competing-risks setting, because the overall hazard function for any failure is the sum of the cause-specific hazard functions from all observable causes. Estimation of the CIF for a specific cause requires a complete analysis of all failure causes and entails fitting multiple cause-specific models, one for each observable cause. This method of analyzing competing-risks data is often called the cause-specific analysis or the Cox model. To request this method, specify the EVENTCODE(COX)= option in the MODEL statement.
Alternatively, Fine and Gray (1999) propose a proportional hazards model for the cumulative incidence of a failure cause of interest. The proportional hazard assumption is imposed on the subdistribution hazard, which is the hazard of the cumulative incidence. This method is often called as the proportional subdistribution hazard regression or the Fine and Gray model. To request this method, specify the EVENTCODE(FG)= option (or simply the EVENTCODE= option) in the MODEL statement.
You can request the CIF curves for a specified set of covariates by using the BASELINE statement. To request the cumulative incidence at specific time points, use the TIMELIST= option in the BASELINE statement. The PLOTS=CIF option in the PROC PHREG statement displays the CIF curves.
For each observable cause of failure k, the cause-specific hazard for a subject that has a covariate vector Z follows the proportional hazards assumption,
where is an arbitrary and unspecified baseline cause-specific hazard function and
is the vector of regression coefficients. The regression coefficients for cause k,
, are estimated by fitting a standard Cox model where observations that have the kth cause of failure are treated as event observations and all other observations are treated as censored observations.
You request a Cox regression analysis by specifying the EVENTCODE(COX)= option in the MODEL statement. When you specify this option, the following statements and options are ignored: the ASSESS, BAYES, HAZARDRATIO, RANDOM, STRATA, and WEIGHT statements; the ATRISK and COVS options in the PROC PHREG statement; and the following options in the MODEL statement: BEST=, DETAILS, ENTRY=, HIERARCHY=, INCLUDE=, NOFIT, PLCONV=, RISKLIMITS=PL, SELECTION=, SEQUENTIAL, SLENTRY=, SLSTAY=, TYPE1, and TYPE3(LR, SCORE).
For the ith subject, , let
,
,
, and
be the observed time, event indicator, cause of failure, and covariate vector at time t, respectively. Assume that K causes of failure are observable (
). Let
Note that if , then
and
; if
, then
and
.
For the lth cause-specific Cox model, denote the maximum partial likelihood estimates as and the baseline cumulative hazard function as
. For an individual that has covariates
, the predicted cumulative hazard function estimate for cause l is
The predicted survivor function is
The predicted cumulative incidence for cause l is
Based on the counting process and martingale theory (Andersen et al. 1992), the variance estimator of is a sum of two terms,
First, consider TIES=BRESLOW. Let
Replacing by the maximum partial likelihood estimate
, let
and
where ; and when
,
if
and 0 otherwise.
The first term of the variance estimator is
where is the observed information matrix.
The second term of the variance estimator is
For TIES=EFRON, the preceding computation is modified to comply with the Efron partial likelihood. For a particular time t, let =1 if the t is an event time of the ith subject and 0 otherwise. Let
, which is the number of subjects that have an event at t. For
, let
The second term of the variance estimator becomes
Confidence intervals for the cumulative incidence are computed from
and
in the same fashion as those of the survival function. The CLTYPE option in the BASELINE statement enables you to choose the LOG transformation, the LOGLOG (log of negative log) transformation, or no transformation to compute the confidence limits. For more information, see the section Confidence Intervals for the Survivor Function.
The proportional hazards model for the subdistribution that Fine and Gray (1999) propose aims to model the cumulative incidence of an event of interest. They define a subdistribution hazard,
where is the cumulative incidence function for the failure of cause k. They also impose a proportional hazards assumption on the subdistribution hazards:
The estimation of the regression coefficients is based on modified risk sets, where subjects that experience a competing event are retained after their event. The weight of subjects that are artificially retained in the risk sets is gradually reduced according to the conditional probability of being under follow-up if the competing event had not occurred.
You use PROC PHREG to fit the Fine and Gray (1999) model by specifying the EVENTCODE= option in the MODEL statement to indicate the event of interest. Maximum likelihood estimates of the regression coefficients are obtained by the Newton-Raphson algorithm. The covariance matrix of the parameter estimator is computed as a sandwich estimate. You can request the CIF curves for a particular set of covariates by using the BASELINE statement. The PLOTS=CIF option in the PROC PHREG statement displays a plot of the curves. You can obtain Schoenfeld residuals and score residuals by using the OUTPUT statement.
To model the subdistribution hazards for clustered data (Zhou et al. 2012), you use the COVS(AGGREGATE) option in the PROC PHREG statement. You need to specify the ID statement to identify the clusters. To model the subdistribution hazards for stratified data (Zhou et al. 2011), you use the STRATA statement. PROC PHREG handles only regular stratified data that have a small number of large subject groups.
When you specify the EVENTCODE= option in the MODEL statement, the following statements and options are ignored: the ASSESS, BAYES, and RANDOM statements; the ATRISK and COVM options in the PROC PHREG statement; and the following options in the MODEL statement: BEST=, DETAILS, HIERARCHY=, INCLUDE=, NOFIT, PLCONV=, RISKLIMITS=PL, SELECTION=, SEQUENTIAL, SLENTRY=, SLSTAY=, TYPE1, and TYPE3(LR, SCORE). Profile likelihood confidence intervals for the hazard ratios are not available for the Fine and Gray competing-risks analysis.
For the ith subject, , let
,
,
, and
be the observed time, event indicator, cause of failure, and covariate vector at time t, respectively. Assume that K causes of failure are observable (
). Consider failure from cause 1 to be the failure of interest, with failures of other causes as competing events. Let
Note that if , then
and
; if
, then
and
. Let
where is the Kaplan-Meier estimate (alternatively, the Breslow estimate) of the survivor function of the censoring variable, which is calculated using
. If
, then
when
and is 0 otherwise; if
, then
. Table 12 displays the weight of a subject as a function of time.
The regression coefficients are estimated by maximizing the pseudo-likelihood
with respect to
:
The variance-covariance matrix of the maximum likelihood estimator is approximated by a sandwich estimate.
The score function and the observed information matrix
are given by
The sandwich variance estimate of is
where is the estimate of the variance-covariance matrix of
that is given by
where
You can use the OUTPUT statement to output Schoenfeld residuals and score residuals to a SAS data set.
For an individual that has covariates , the cumulative subdistribution hazard is estimated by
and the predicted cumulative incidence is
To compute the confidence interval for the cumulative incidence, consider a monotone transformation with first derivative
. Fine and Gray (1999, Section 5) give the following method to calculate pointwise confidence intervals. First, you generate B samples of normal random deviates
. You can specify the value of B by using the NORMALSAMPLE= option in the BASELINE statement. Then, you compute the estimate of var
as
where
A 100(1–)% confidence interval for
is given by
where is the
percentile of a standard normal distribution.
The CLTYPE= option in the BASELINE statement enables you to choose the LOG transformation, the LOGLOG (log of negative log) transformation, or the IDENTITY transformation. You can also output the standard error of the cumulative incidence, which is approximated by the delta method as follows:
where . Table 13 displays the variance estimator for each transformation that is available in PROC PHREG.