(View the complete code for this example.)
You can use PROC MCMC to fit failure time data that can be right-, left-, or interval-censored. To illustrate, a normal regression model is used in this example.
You can use either of two approaches to fit interval-censored data. One is to specify the marginal model, and the other is to treat the censored data as missing values.
Assume that you have a simple regression model with no covariates,
where is a vector of response values (the failure times),
is the grand mean,
is an unknown scale parameter, and
are errors from the standard normal distribution. Instead of observing
directly, you observe only a truncated value
. If the true
occurs after the censored time
, the data are called right-censored. If
occurs before the censored time, the data are called left-censored. A failure time
can be censored at both ends, and these data are called interval-censored. The likelihood for
is
where is the survival function and
. When a datum is uncensored, you use a normal likelihood. When a datum is censored, you use the cumulative distribution to account for the likelihood.
Gentleman and Geyer (1994) uses the following data on cosmetic deterioration for early breast cancer patients who are treated with radiotherapy:
title 'Normal Regression with Interval Censoring';
data cosmetic;
t = .;
label tl = 'Time to Event (Months)';
input tl tr @@;
datalines;
45 . 6 10 . 7 46 . 46 . 7 16 17 . 7 14
37 44 . 8 4 11 15 . 11 15 22 . 46 . 46 .
25 37 46 . 26 40 46 . 27 34 36 44 46 . 36 48
37 . 40 . 17 25 46 . 11 18 38 . 5 12 37 .
. 5 18 . 24 . 36 . 5 11 19 35 17 25 24 .
32 . 33 . 19 26 37 . 34 . 36 .
;
The data consist of time interval endpoints (in months). Nonmissing equal endpoints (tl = tr) indicate noncensoring; a nonmissing lower endpoint (tl .) and a missing upper endpoint (
tr = .) indicate right-censoring; a missing lower endpoint (tl = .) and a nonmissing upper endpoint (tr .) indicate left-censoring; and nonmissing unequal endpoints (
tl
tr) indicate interval censoring. In this data set, all observations are censored (all t are missing).
With this data set, you can consider using proper but diffuse priors on both and
. For example,
The following SAS statements fit an interval-censoring model by using its marginal distribution and generate Output 80.17.1:
proc mcmc data=cosmetic outpost=postout seed=1 nmc=20000 missing=AC;
ods select PostSumInt;
parms mu 60 sigma 50;
prior mu ~ normal(0, sd=1000);
prior sigma ~ gamma(shape=0.001,iscale=0.001);
if (tl^=. and tr^=. and tl=tr) then
llike = logpdf('normal',tr,mu,sigma);
else if (tl^=. and tr=.) then
llike = logsdf('normal',tl,mu,sigma);
else if (tl=. and tr^=.) then
llike = logcdf('normal',tr,mu,sigma);
else
llike = log(sdf('normal',tl,mu,sigma) -
sdf('normal',tr,mu,sigma));
model general(llike);
run;
Because there are missing cells in the input data, you want to use the MISSING=AC option so that PROC MCMC does not delete any observations that contain missing values. The IF-ELSE statements distinguish different censoring cases for according to the likelihood. The SAS functions LOGCDF, LOGSDF, LOGPDF, and SDF are useful here. The MODEL statement assigns
llike as the log likelihood to the response. The Markov chain appears to have converged in this example (evidence not shown here), and the posterior estimates are shown in Output 80.17.1.
Output 80.17.1: Interval Censoring
| Normal Regression with Interval Censoring |
| Posterior Summaries and Intervals | |||||
|---|---|---|---|---|---|
| Parameter | N | Mean | Standard Deviation |
95% HPD Interval | |
| mu | 20000 | 41.7807 | 5.7882 | 31.3604 | 53.6115 |
| sigma | 20000 | 29.1122 | 6.0503 | 19.4041 | 41.6742 |
The marginal model approach is more efficient because the censored observations are integrated out. However, you might not always have the cumulative distributions readily available in all scenarios. One general and alternative approach is to treat all censored variables as latent variables (or missing data). You fit the same model by imputing the would-be values in the censored data and estimating the model parameters. The censoring information, which specifies the restricted range of the unobserved variables, is specified in the CLOWER= and CUPPER= options.
The following SAS statements fit censored data by using the missing data approach:
proc mcmc data=cosmetic outpost=postout seed=117207154
nmc=20000 missing=ACMODELY;
ods select none;
parms mu 60 sigma 50;
prior mu ~ normal(0, sd=1000);
prior sigma ~ gamma(shape=0.001,iscale=0.001);
model t ~ normal(mu, sd=sigma, clower=tl, cupper=tr);
run;
By default, PROC MCMC discards observations that have missing values in covariate variables (a covariate is a data set variable that appears in the program but not to the left of the tilde in a MODEL statement). To keep observations that have missing tl or tr values, specify the MISSING=ACMODELY option, which keeps all observations and models the missing response variable. This approach produces estimates that are equivalent to those from the marginal model approach. The results are not shown here.