The MCMC Procedure

Example 80.17 Normal Regression with Interval Censoring

(View the complete code for this example.)

You can use PROC MCMC to fit failure time data that can be right-, left-, or interval-censored. To illustrate, a normal regression model is used in this example.

You can use either of two approaches to fit interval-censored data. One is to specify the marginal model, and the other is to treat the censored data as missing values.

Assume that you have a simple regression model with no covariates,

bold y equals mu plus sigma bold-italic epsilon

where bold y is a vector of response values (the failure times), mu is the grand mean, sigma is an unknown scale parameter, and bold-italic epsilon are errors from the standard normal distribution. Instead of observing y Subscript i directly, you observe only a truncated value t Subscript i. If the true y Subscript i occurs after the censored time t Subscript i, the data are called right-censored. If y Subscript i occurs before the censored time, the data are called left-censored. A failure time y Subscript i can be censored at both ends, and these data are called interval-censored. The likelihood for y Subscript i is

p left-parenthesis y Subscript i Baseline vertical-bar mu right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column phi left-parenthesis y Subscript i Baseline vertical-bar mu comma sigma right-parenthesis 2nd Column if y Subscript i Baseline is uncensored 2nd Row 1st Column upper S left-parenthesis t Subscript l comma i Baseline vertical-bar mu right-parenthesis 2nd Column if y Subscript i Baseline is right hyphen censored by t Subscript l comma i Baseline 3rd Row 1st Column 1 minus upper S left-parenthesis t Subscript r comma i Baseline vertical-bar mu right-parenthesis 2nd Column if y Subscript i Baseline is left hyphen censored by t Subscript r comma i Baseline 4th Row 1st Column upper S left-parenthesis t Subscript l comma i Baseline vertical-bar mu right-parenthesis minus upper S left-parenthesis t Subscript r comma i Baseline vertical-bar mu right-parenthesis 2nd Column if y Subscript i Baseline is interval hyphen censored by t Subscript l comma i Baseline and t Subscript r comma i Baseline EndLayout

where upper S left-parenthesis dot right-parenthesis is the survival function and upper S left-parenthesis t right-parenthesis equals probability left-parenthesis upper T greater-than t right-parenthesis. When a datum is uncensored, you use a normal likelihood. When a datum is censored, you use the cumulative distribution to account for the likelihood.

Gentleman and Geyer (1994) uses the following data on cosmetic deterioration for early breast cancer patients who are treated with radiotherapy:

title 'Normal Regression with Interval Censoring';
data cosmetic;
   t = .;
   label tl = 'Time to Event (Months)';
   input tl tr @@;
   datalines;
45  .   6 10   .  7  46  .  46  .   7 16  17  .   7 14
37 44   .  8   4 11  15  .  11 15  22  .  46  .  46  .
25 37  46  .  26 40  46  .  27 34  36 44  46  .  36 48
37  .  40  .  17 25  46  .  11 18  38  .   5 12  37  .
 .  5  18  .  24  .  36  .   5 11  19 35  17 25  24  .
32  .  33  .  19 26  37  .  34  .  36  .
;

The data consist of time interval endpoints (in months). Nonmissing equal endpoints (tl = tr) indicate noncensoring; a nonmissing lower endpoint (tl not-equals .) and a missing upper endpoint (tr = .) indicate right-censoring; a missing lower endpoint (tl = .) and a nonmissing upper endpoint (tr not-equals .) indicate left-censoring; and nonmissing unequal endpoints (tl not-equals tr) indicate interval censoring. In this data set, all observations are censored (all t are missing).

With this data set, you can consider using proper but diffuse priors on both mu and sigma. For example,

StartLayout 1st Row 1st Column mu 2nd Column tilde 3rd Column normal left-parenthesis 0 comma sd equals 1000 right-parenthesis 2nd Row 1st Column sigma 2nd Column tilde 3rd Column gamma left-parenthesis 0.001 comma iscale equals 0.001 right-parenthesis EndLayout

The following SAS statements fit an interval-censoring model by using its marginal distribution and generate Output 80.17.1:

proc mcmc data=cosmetic outpost=postout seed=1 nmc=20000 missing=AC;
   ods select PostSumInt;
   parms mu 60 sigma 50;

   prior mu ~ normal(0, sd=1000);
   prior sigma ~ gamma(shape=0.001,iscale=0.001);

   if (tl^=. and tr^=. and tl=tr) then
      llike = logpdf('normal',tr,mu,sigma);
   else if (tl^=. and tr=.) then
      llike = logsdf('normal',tl,mu,sigma);
   else if (tl=. and tr^=.) then
      llike = logcdf('normal',tr,mu,sigma);
   else
      llike = log(sdf('normal',tl,mu,sigma) -
         sdf('normal',tr,mu,sigma));

   model general(llike);
run;

Because there are missing cells in the input data, you want to use the MISSING=AC option so that PROC MCMC does not delete any observations that contain missing values. The IF-ELSE statements distinguish different censoring cases for y Subscript i according to the likelihood. The SAS functions LOGCDF, LOGSDF, LOGPDF, and SDF are useful here. The MODEL statement assigns llike as the log likelihood to the response. The Markov chain appears to have converged in this example (evidence not shown here), and the posterior estimates are shown in Output 80.17.1.

Output 80.17.1: Interval Censoring

Normal Regression with Interval Censoring

The MCMC Procedure

Posterior Summaries and Intervals
Parameter N Mean Standard
Deviation
95% HPD Interval
mu 20000 41.7807 5.7882 31.3604 53.6115
sigma 20000 29.1122 6.0503 19.4041 41.6742


The marginal model approach is more efficient because the censored observations are integrated out. However, you might not always have the cumulative distributions readily available in all scenarios. One general and alternative approach is to treat all censored variables as latent variables (or missing data). You fit the same model by imputing the would-be values in the censored data and estimating the model parameters. The censoring information, which specifies the restricted range of the unobserved variables, is specified in the CLOWER= and CUPPER= options.

The following SAS statements fit censored data by using the missing data approach:

proc mcmc data=cosmetic outpost=postout seed=117207154
   nmc=20000 missing=ACMODELY;
   ods select none;
   parms mu 60 sigma 50;

   prior mu ~ normal(0, sd=1000);
   prior sigma ~ gamma(shape=0.001,iscale=0.001);

   model t ~ normal(mu, sd=sigma, clower=tl, cupper=tr);
run;

By default, PROC MCMC discards observations that have missing values in covariate variables (a covariate is a data set variable that appears in the program but not to the left of the tilde in a MODEL statement). To keep observations that have missing tl or tr values, specify the MISSING=ACMODELY option, which keeps all observations and models the missing response variable. This approach produces estimates that are equivalent to those from the marginal model approach. The results are not shown here.

Last updated: December 09, 2022