The LIFEREG Procedure

Bayesian Analysis

Gibbs Sampling

This section provides details about Bayesian analysis by Gibbs sampling in the location-scale models for survival data available in PROC LIFEREG. See the section Gibbs Sampler in Chapter 8, Introduction to Bayesian Analysis Procedures, for a general discussion of Gibbs sampling. PROC LIFEREG fits parametric location-scale survival models. That is, the probability density of the response Y can expressed in the general form

f left-parenthesis y right-parenthesis equals g left-parenthesis StartFraction y minus mu Over sigma EndFraction right-parenthesis

where upper Y equals log left-parenthesis upper T right-parenthesis for lifetimes T. The function g determines the specific distribution. The location parameter mu Subscript i is modeled through regression parameters as mu Subscript i Baseline equals bold x prime Subscript i Baseline bold-italic beta. The LIFEREG procedure can provide Bayesian estimates of the regression parameters and sigma. The OUTPUT and PROBPLOT statements, if specified, are ignored. The PLOTS=PROBPLOT option in the PROC LIFEREG statement and the CORRB and COVB options in the MODEL statement are also ignored.

For the Weibull distribution, you can specify that Gibbs sampling be performed on the Weibull shape parameter beta equals sigma Superscript negative 1 instead of the scale parameter sigma by specifying a prior distribution for the shape parameter with the WEIBULLSHAPEPRIOR= option. In addition, if there are no covariates in the model, you can specify Gibbs sampling on the Weibull scale parameter alpha equals exp left-parenthesis mu right-parenthesis, where mu is the intercept term, with the WEIBULLSCALEPRIOR= option.

In the case of the exponential distribution with no covariates, you can specify Gibbs sampling on the exponential scale parameter alpha equals exp left-parenthesis mu right-parenthesis, where mu is the intercept term, with the EXPSCALEPRIOR= option.

Let bold-italic theta equals left-parenthesis theta 1 comma ellipsis comma theta Subscript k Baseline right-parenthesis prime be the parameter vector. For location-scale models, the theta Subscript i’s are the regression coefficients beta Subscript i’s and the scale parameter sigma. In the case of the three-parameter gamma distribution, there is an additional gamma shape parameter tau. Let upper L left-parenthesis upper D vertical-bar bold-italic theta right-parenthesis be the likelihood function, where D is the observed data. Let pi left-parenthesis bold-italic theta right-parenthesis be the prior distribution. The full conditional distribution of left-bracket theta Subscript i Baseline vertical-bar theta Subscript j Baseline comma i not-equals j right-bracket is proportional to the joint distribution; that is,

pi left-parenthesis theta Subscript i Baseline vertical-bar theta Subscript j Baseline comma i not-equals j comma upper D right-parenthesis proportional-to upper L left-parenthesis upper D vertical-bar bold-italic theta right-parenthesis p left-parenthesis bold-italic theta right-parenthesis

For instance, the one-dimensional conditional distribution of theta 1 given theta Subscript j Baseline equals theta Subscript j Superscript asterisk Baseline comma 2 less-than-or-equal-to j less-than-or-equal-to k, is computed as

pi left-parenthesis theta 1 vertical-bar theta Subscript j Baseline equals theta Subscript j Superscript asterisk Baseline comma 2 less-than-or-equal-to j less-than-or-equal-to k comma upper D right-parenthesis equals upper L left-parenthesis upper D vertical-bar left-parenthesis bold-italic theta equals left-parenthesis theta 1 comma theta 2 Superscript asterisk Baseline comma ellipsis comma theta Subscript k Superscript asterisk Baseline right-parenthesis Superscript prime Baseline right-parenthesis p left-parenthesis bold-italic theta equals left-parenthesis theta 1 comma theta 2 Superscript asterisk Baseline comma ellipsis comma theta Subscript k Superscript asterisk Baseline right-parenthesis prime right-parenthesis

Suppose you have a set of arbitrary starting values StartSet theta 1 Superscript left-parenthesis 0 right-parenthesis Baseline comma ellipsis comma theta Subscript k Superscript left-parenthesis 0 right-parenthesis Baseline EndSet. Using the ARMS (adaptive rejection Metropolis sampling) algorithm of Gilks and Wild (1992) and Gilks, Best, and Tan (1995), you can do the following:

  • draw theta 1 Superscript left-parenthesis 1 right-parenthesis from left-bracket theta 1 vertical-bar theta 2 Superscript left-parenthesis 0 right-parenthesis Baseline comma ellipsis comma theta Subscript k Superscript left-parenthesis 0 right-parenthesis Baseline right-bracket

  • draw theta 2 Superscript left-parenthesis 1 right-parenthesis from left-bracket theta 2 vertical-bar theta 1 Superscript left-parenthesis 1 right-parenthesis Baseline comma theta 3 Superscript left-parenthesis 0 right-parenthesis Baseline comma ellipsis comma theta Subscript k Superscript left-parenthesis 0 right-parenthesis Baseline right-bracket

  • ellipsis

  • draw theta Subscript k Superscript left-parenthesis 1 right-parenthesis from left-bracket theta Subscript k Baseline vertical-bar theta 1 Superscript left-parenthesis 1 right-parenthesis Baseline comma ellipsis comma theta Subscript k minus 1 Superscript left-parenthesis 1 right-parenthesis Baseline right-bracket

This completes one iteration of the Gibbs sampler. After one iteration, you have StartSet theta 1 Superscript left-parenthesis 1 right-parenthesis Baseline comma ellipsis comma theta Subscript k Superscript left-parenthesis 1 right-parenthesis Baseline EndSet. After n iterations, you have StartSet theta 1 Superscript left-parenthesis n right-parenthesis Baseline comma ellipsis comma theta Subscript k Superscript left-parenthesis n right-parenthesis Baseline EndSet. PROC LIFEREG implements the ARMS algorithm based on a program provided by Gilks (2003) to draw a sample from a full conditional distribution. See the section Assessing Markov Chain Convergence in Chapter 8, Introduction to Bayesian Analysis Procedures, for information about assessing the convergence of the chain of posterior samples.

You can output these posterior samples into a SAS data set. The following option in the BAYES statement outputs the posterior samples into the SAS data set Post: OUTPOST=Post. The data set also includes the variables LogPost and LogLike, which represent the log of the posterior distribution and the log of the likelihood, respectively.

Priors for Model Parameters

The model parameters are the regression coefficients and the dispersion parameter (or the precision or scale), if the model has one. The priors for the dispersion parameter and the priors for the regression coefficients are assumed to be independent, while you can have a joint multivariate normal prior for the regression coefficients.

Scale and Shape Parameters
Gamma Prior

The gamma distribution upper G left-parenthesis a comma b right-parenthesis has a PDF

f Subscript a comma b Baseline left-parenthesis u right-parenthesis equals StartFraction b left-parenthesis b u right-parenthesis Superscript a minus 1 Baseline normal e Superscript minus b u Baseline Over normal upper Gamma left-parenthesis a right-parenthesis EndFraction comma u greater-than 0

where a is the shape parameter and b is the inverse-scale parameter. The mean is StartFraction a Over b EndFraction and the variance is StartFraction a Over b squared EndFraction.

Improper Prior

The joint prior density is given by

p left-parenthesis u right-parenthesis proportional-to u Superscript negative 1 Baseline comma u greater-than 0
Regression Coefficients

Let bold-italic beta be the regression coefficients.

Normal Prior

Assume bold-italic beta has a multivariate normal prior with mean vector bold-italic beta 0 and covariance matrix bold upper Sigma 0. The joint prior density is given by

p left-parenthesis bold-italic beta right-parenthesis proportional-to normal e Superscript minus one-half left-parenthesis bold-italic beta minus bold-italic beta 0 right-parenthesis prime bold upper Sigma 0 Super Superscript negative 1 Superscript left-parenthesis bold-italic beta minus bold-italic beta 0 right-parenthesis
Uniform Prior

The joint prior density is given by

p left-parenthesis bold-italic beta right-parenthesis proportional-to 1

Posterior Distribution

Denote the observed data by D.

The posterior distribution is

pi left-parenthesis bold-italic theta vertical-bar upper D right-parenthesis proportional-to upper L Subscript upper P Baseline left-parenthesis upper D vertical-bar bold-italic theta right-parenthesis p left-parenthesis bold-italic theta right-parenthesis

where upper L Subscript upper P Baseline left-parenthesis upper D vertical-bar bold-italic theta right-parenthesis is the likelihood function with regression coefficients and any additional parameters, such as scale or shape, bold-italic theta as parameters; and p left-parenthesis bold-italic theta right-parenthesis is the joint prior distribution of the parameters.

Deviance Information Criterion

Let bold-italic theta Subscript i be the model parameters at iteration i of the Gibbs sampler, and let LL(bold-italic theta Subscript i) be the corresponding model log likelihood. PROC LIFEREG computes the following fit statistics defined by Spiegelhalter et al. (2002):

  • effective number of parameters:

    p Subscript upper D Baseline equals ModifyingAbove normal upper L normal upper L left-parenthesis bold-italic theta right-parenthesis With bar minus normal upper L normal upper L left-parenthesis bold-italic theta overbar right-parenthesis
  • deviance information criterion (DIC):

    normal upper D normal upper I normal upper C equals ModifyingAbove normal upper L normal upper L left-parenthesis bold-italic theta right-parenthesis With bar plus p Subscript upper D

where

ModifyingAbove normal upper L normal upper L left-parenthesis bold-italic theta right-parenthesis With bar equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts normal upper L normal upper L left-parenthesis bold-italic theta Subscript i Baseline right-parenthesis
bold-italic theta overbar equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts bold-italic theta Subscript i

and n is the number of Gibbs samples.

Starting Values of the Markov Chains

When the BAYES statement is specified, PROC LIFEREG generates one Markov chain containing the approximate posterior samples of the model parameters. Additional chains are produced when the Gelman-Rubin diagnostics are requested. Starting values (or initial values) can be specified in the INITIAL= data set in the BAYES statement. If INITIAL= option is not specified, PROC LIFEREG picks its own initial values for the chains.

Denote left-bracket x right-bracket as the integral value of x. Denote ModifyingAbove s With caret left-parenthesis upper X right-parenthesis as the estimated standard error of the estimator X.

Regression Coefficients and Gamma Shape Parameter

For the first chain that the summary statistics and regression diagnostics are based on, the default initial values are estimates of the mode of the posterior distribution. If the INITIALMLE option is specified, the initial values are the maximum likelihood estimates; that is,

beta Subscript i Superscript left-parenthesis 0 right-parenthesis Baseline equals ModifyingAbove beta With caret Subscript i

Initial values for the rth chain (r greater-than-or-equal-to 2) are given by

beta Subscript i Superscript left-parenthesis 0 right-parenthesis Baseline equals ModifyingAbove beta With caret Subscript i Baseline plus-or-minus left-parenthesis 2 plus left-bracket StartFraction r Over 2 EndFraction right-bracket right-parenthesis ModifyingAbove s With caret left-parenthesis ModifyingAbove beta With caret Subscript i Baseline right-parenthesis

with the plus sign for odd r and minus sign for even r.

Scale, Exponential Scale, Weibull Scale, or Weibull Shape Parameter lamda

Let lamda be the parameter sampled.

For the first chain that the summary statistics and diagnostics are based on, the initial values are estimates of the mode of the posterior distribution; or the maximum likelihood estimates if the INITIALMLE option is specified; that is,

lamda Superscript left-parenthesis 0 right-parenthesis Baseline equals ModifyingAbove lamda With caret

The initial values of the rth chain (r greater-than-or-equal-to 2) are given by

lamda Superscript left-parenthesis 0 right-parenthesis Baseline equals ModifyingAbove lamda With caret normal e Superscript plus-or-minus left-parenthesis left-bracket StartFraction r Over 2 EndFraction right-bracket plus 2 right-parenthesis ModifyingAbove s With caret left-parenthesis ModifyingAbove lamda With caret right-parenthesis

with the plus sign for odd r and minus sign for even r.

OUTPOST= Output Data Set

The OUTPOST= data set contains the generated posterior samples. There are 2+n variables, where n is the number of model parameters. The variable Iteration represents the iteration number and the variable LogPost contains the log posterior likelihood values. The other n variables represent the draws of the Markov chain for the model parameters.

Last updated: December 09, 2022