The BGLIMM Procedure

PREDDIST Statement

PREDDIST <'label'> OUTPRED=SAS-data-set <NSIM=n> <COVARIATES=SAS-data-set><STATISTICS=options>;

The PREDDIST statement creates a new SAS data set that contains random samples from the posterior predictive distribution of the response variable. The posterior predictive distribution is the distribution of unobserved observations (prediction) conditional on the observed data. Let be the observed data, be the covariates, be the parameter, and be the unobserved data. The posterior predictive distribution is defined as

Given the assumption that the observed and unobserved data are conditional independent given , the posterior predictive distribution can be further simplified as

p left-parenthesis bold y Subscript pred Baseline vertical-bar bold y comma bold upper X right-parenthesis equals integral p left-parenthesis bold y Subscript pred Baseline vertical-bar theta right-parenthesis p left-parenthesis theta vertical-bar bold y comma bold upper X right-parenthesis d theta

The posterior predictive distribution is an integral of the likelihood function with respect to the posterior distribution . The PREDDIST statement generates samples from a posterior predictive distribution on the basis of draws from the posterior distribution of .

You can specify the following options:

COVARIATES=SAS-data-set

names the SAS data set that contains the sets of explanatory variable values for which the predictions are established. This data set must contain data that have the same variable names that are used in the likelihood function. If you omit this option, the DATA= data set that you specify in the PROC BGLIMM statement is used instead.

ILINK

outputs the inverse link function of the linear predictor for each observation.

LINP

outputs the linear predictors.

MILINK

outputs the inverse link function of the marginal linear predictor for each observation.

MLINP

outputs the marginal linear predictor for each observation.

NSIM=n

specifies the number of simulated predicted values. By default, n is the same as the NMC= option value that you specify in the PROC BGLIMM statement.

OUTPRED=SAS-data-set

creates an output data set to contain the samples from the posterior predictive distribution. The output variable names are listed as resp_1–resp_m, where resp is the name of the response variable and m is the number of observations in the COVARIATES= data set in the PREDDIST statement. If the COVARIATES= data set is not specified, m is the number of observations in the DATA= data set that you specify in the PROC BGLIMM statement.

Table 8 displays the keywords for the variables to be included in the OUTPRED= data set.

Table 8: Keywords for Variables in OUTPRED= Data Set

Keyword	Description	Variable Name
ILINK	Mean using inverse link	ILink
LINP	Linear predictor	Linp
MILINK	Marginal mean using inverse link	MILnk
MLINP	Marginal linear predictor	MLinp

STATISTICS<(global-options)> = NONE | ALL |stats-request STATS<(global-options)> = NONE | ALL |stats-request

specifies options for calculating posterior statistics. This option works in exactly the same way as the STATISTICS= option in the PROC BGLIMM statement. By default, the STATS option takes the same value that you specify in the STATISTICS= option in the PROC BGLIMM statement.

Last updated: December 09, 2022