The BCHOICE Procedure

PREDDIST Statement

  • PREDDIST OUTPRED=SAS-data-set <COVARIATES=SAS-data-set>;

The PREDDIST statement creates a new SAS data set that contains random samples from the posterior predictive distribution of the choice probabilities. It enables you to get the expected choice probabilities of all the alternatives in a choice set.

The posterior predictive distribution is the distribution of unobserved observations (prediction) conditional on the observed data. Let bold upper Y be the observed data, bold upper X be the covariates, bold-italic theta be the parameter, and bold upper Y Subscript pred be the unobserved data. The posterior predictive distribution is defined as follows:

StartLayout 1st Row 1st Column p left-parenthesis bold upper Y Subscript pred Baseline vertical-bar bold upper Y comma bold upper X right-parenthesis 2nd Column equals 3rd Column integral p left-parenthesis bold upper Y Subscript pred Baseline comma bold-italic theta vertical-bar bold upper Y comma bold upper X right-parenthesis d bold-italic theta 2nd Row 1st Column Blank 2nd Column equals 3rd Column integral p left-parenthesis bold upper Y Subscript pred Baseline vertical-bar bold-italic theta comma bold upper Y comma bold upper X right-parenthesis p left-parenthesis bold-italic theta vertical-bar bold upper Y comma bold upper X right-parenthesis d bold-italic theta EndLayout

Assuming that the observed and unobserved data are conditionally independent given bold-italic theta, the posterior predictive distribution can be further simplified as follows:

p left-parenthesis bold upper Y Subscript pred Baseline vertical-bar bold upper Y comma bold upper X right-parenthesis equals integral p left-parenthesis bold upper Y Subscript pred Baseline vertical-bar bold-italic theta right-parenthesis p left-parenthesis bold-italic theta vertical-bar bold upper Y comma bold upper X right-parenthesis d bold-italic theta

The posterior predictive distribution is an integral of the likelihood function p left-parenthesis bold upper Y Subscript pred Baseline vertical-bar bold-italic theta right-parenthesis with respect to the posterior distribution p left-parenthesis bold-italic theta vertical-bar bold upper Y right-parenthesis. The PREDDIST statement generates samples from a posterior predictive distribution based on draws from the posterior distribution of bold-italic theta.

You can specify the following options:

COVARIATES=SAS-data-set

names the SAS data set to contain the sets of explanatory variable values for which the predictions are established. This data set must contain data that has the same variables used in the model. If you omit the COVARIATES= option, the DATA= data set that is specified in the PROC BCHOICE statement is used instead.

NALTER=n
NALTERNATIVE=n

specifies the number of alternatives in a choice set in the COVARIATES= data set. All choice sets in the data must have the same number of alternatives. You must specify this option if a COVARIATES= data set is given.

OUTPRED=SAS-data-set

creates an output data set to contain the samples from the posterior predictive distribution of the choice probability that each alternative is chosen from a choice set. The output data set are in the order of either the COVARIATES= data set or the DATA= data set specified in the PROC statement. Multi-threading and data deletion might cause the order to change.

Last updated: December 09, 2022