The MCMC Procedure

PREDDIST Statement

  • PREDDIST <'label'> OUTPRED=SAS-data-set <NSIM=n> <COVARIATES=SAS-data-set><STATISTICS=options>;

The PREDDIST statement creates a new SAS data set that contains random samples from the posterior predictive distribution of the response variable. The posterior predictive distribution is the distribution of unobserved observations (prediction) conditional on the observed data. Let bold y be the observed data, bold upper X be the covariates, theta be the parameter, and bold y Subscript pred be the unobserved data. The posterior predictive distribution is defined to be the following:

StartLayout 1st Row 1st Column p left-parenthesis bold y Subscript pred Baseline vertical-bar bold y comma bold upper X right-parenthesis 2nd Column equals 3rd Column integral p left-parenthesis bold y Subscript pred Baseline comma theta vertical-bar bold y comma bold upper X right-parenthesis d theta 2nd Row 1st Column Blank 2nd Column equals 3rd Column integral p left-parenthesis bold y Subscript pred Baseline vertical-bar theta comma bold y comma bold upper X right-parenthesis p left-parenthesis theta vertical-bar bold y comma bold upper X right-parenthesis d theta EndLayout

Given the assumption that the observed and unobserved data are conditional independent given theta, the posterior predictive distribution can be further simplified as the following:

p left-parenthesis bold y Subscript pred Baseline vertical-bar bold y comma bold upper X right-parenthesis equals integral p left-parenthesis bold y Subscript pred Baseline vertical-bar theta right-parenthesis p left-parenthesis theta vertical-bar bold y comma bold upper X right-parenthesis d theta

The posterior predictive distribution is an integral of the likelihood function p left-parenthesis bold y Subscript pred Baseline vertical-bar theta right-parenthesis with respect to the posterior distribution p left-parenthesis theta vertical-bar bold y right-parenthesis. The PREDDIST statement generates samples from a posterior predictive distribution based on draws from the posterior distribution of theta.

The PREDDIST statement works only on response variables that have standard distributions, and it does not support either the GENERAL or DGENERAL functions. Multiple PREDDIST statements can be specified, and an optional label (specified as a quoted string) helps identify the output.

The following list explains specifications in the PREDDIST statement:

COVARIATES=SAS-data-set

names the SAS data set that contains the sets of explanatory variable values for which the predictions are established. This data set must contain data with the same variable names as are used in the likelihood function. If you omit the COVARIATES= option, the DATA= data set specified in the PROC MCMC statement is used instead.

NSIM=n

specifies the number of simulated predicted values. By default, NSIM= uses the NMC= option value specified in the PROC MCMC statement.

OUTPRED=SAS-data-set

creates an output data set to contain the samples from the posterior predictive distribution. The output variable names are listed as resp_1resp_m, where resp is the name of the response variable and m is the number of observations in the COVARIATES= data set in the PREDDIST statement. If the COVARIATES= data set is not specified, m is the number of observations in the DATA= data set specified in the PROC statement.

SAVEPARM

outputs to the OUTPRED= data set sampled parameter values that are used in each predictive draw.

STATISTICS<(global-options)> =  NONE | ALL |stats-request
STATS<(global-options)> =  NONE | ALL |stats-request

specifies options for calculating posterior statistics. This option works identically to the STATISTICS= option in the PROC statement. By default, this option takes the specification of the STATISTICS= option in the PROC MCMC statement.

For an example that uses the PREDDIST statement, see the section Posterior Predictive Distribution.

Last updated: December 09, 2022