The BGLIMM Procedure

PROC BGLIMM Statement

  • PROC BGLIMM <options>;

The PROC BGLIMM statement invokes the procedure. Table 1 summarizes the available options in the PROC BGLIMM statement by function. The options are then described fully in alphabetical order.

Table 1: PROC BGLIMM Statement Options

Option Description
Basic Options
DATA= Specifies the SAS input data set
NBI= Specifies the number of burn-in iterations
NMC= Specifies the number of iterations, excluding the burn-in iterations
NTHREADS= Specifies the number of threads to use
OUTPOST= Names the output data set to contain posterior samples of parameters
SAMEBYSEED= Uses the same seed for each BY group
SEED= Sets the seed for pseudorandom number generation
THIN= Specifies the thinning rate
Display Options
NOCLPRINT Limits or suppresses the display of classification variable levels
MAXRESUBPRT Suppresses the display of subject levels in the "Random Effect Information" table
Summary, Diagnostics, and Plotting Options
DIAG= Controls the convergence diagnostics
DIC Computes the deviance information criterion (DIC)
PLOTS= Controls plotting
STATS= Controls posterior statistics
WAIC Computes the Watanabe-Akaike information criterion (WAIC)
Other Options
LOGPOST Calculates the logarithm of the posterior density and likelihood
MISSING= Indicates how to handle missing values
SINGCHOL= Tunes the singularity criterion for Cholesky decomposition
SINGULAR= Tunes the general singularity criterion


You can specify the following options in the PROC BGLIMM statement.

DATA=SAS-data-set

names the input data set for PROC BGLIMM to use. The default is the most recently created data set. Observations in this data set are used to compute the log-likelihood function.

DIAGNOSTICS=NONE | (keyword-list)
DIAG=NONE | (keyword-list)

specifies options for convergence diagnostics. By default, PROC BGLIMM computes the effective sample sizes. The sample autocorrelations, Monte Carlo errors, Geweke test, Raftery-Lewis test, and Heidelberger-Welch test are also available. You can request all the diagnostic tests by specifying DIAGNOSTICS=ALL. You can suppress all the diagnostic tests by specifying DIAGNOSTICS=NONE.

You can specify one or more of the following keyword-list options:

ALL

computes all diagnostic tests and statistics. You can combine this option with any other specific tests to modify test options. For example, DIAGNOSTICS=(ALL AUTOCORR(LAGS=(1 5 35))) computes all tests by using default settings and autocorrelations at lags 1, 5, and 35.

AUTOCORR <(autocorrelation-options)>
AC <(autocorrelation-options)>

computes default autocorrelations at lags 1, 5, 10, and 50 for each variable. You can choose other lags by using the following autocorrelation-option:

LAGS=(numeric-list)

specifies autocorrelation lags. The numeric-list takes only positive integer values.

ESS

computes the effective sample sizes (Kass et al. 1998) of the posterior samples of each parameter. It also computes the correlation time and the efficiency of the chain for each parameter. Small values of ESS might indicate a lack of convergence.

GEWEKE <(Geweke-options)>

computes the Geweke spectral density diagnostics; this is a two-sample t test between the first f 1 portion (as specified by the FRAC1= option) and the last f 2 portion (as specified by the FRAC2= option) of the chain. By default, FRAC1=0.1 and FRAC2=0.5, but you can choose other fractions by using the following Geweke-options:

FRAC1=value
F1=value

specifies the beginning proportion of the Markov chain. By default, FRAC1=0.1.

FRAC2=value
F2=value

specifies the end proportion of the Markov chain. By default, FRAC2=0.5.

HEIDELBERGER <(Heidel-options)>
HEIDEL <(Heidel-options)>

computes the Heidelberger-Welch diagnostic (which consists of a stationarity test and a halfwidth test) for each variable. The stationary diagnostic test tests the null hypothesis that the posterior samples are generated from a stationary process. If the stationarity test is passed, a halfwidth test is then carried out.

You can also specify the following Heidel-options, such as DIAGNOSTICS=HEIDELBERGER(EPS=0.05):

EPS=value

specifies a small positive number epsilon such that if the halfwidth is less than epsilon times the sample mean of the retaining iterations, the halfwidth test is passed. By default, EPS=0.1.

HALPHA=value

specifies the alpha level left-parenthesis 0 less-than alpha less-than 1 right-parenthesis for the halfwidth test. By default, HALPHA=0.05.

SALPHA=value

specifies the alpha level left-parenthesis 0 less-than alpha less-than 1 right-parenthesis for the stationarity test. By default, SALPHA=0.05.

MAXLAG=number

specifies the maximum number of autocorrelation lags to use to compute the effective sample size. The value of number is also used in the calculation of the Monte Carlo standard error. By default, MAXLAG=MIN(500, MCsample/4), where MCsample is the Markov chain sample size that is kept after thinning—that is, MCsample equals left-bracket StartFraction NMC Over NTHIN EndFraction right-bracket. If number is too low, you might observe significant lags, and the effective sample size cannot be calculated accurately. A warning message appears in the SAS log, and you can increase the value of either the MAXLAG= option or the NMC= option accordingly. Specifying this option implies the ESS and MCSE options.

MCSE
MCERROR

computes the Monte Carlo standard error for the posterior samples of each parameter.

NONE

suppresses all the diagnostic tests and statistics. This option is not recommended.

RAFTERY <(Raftery-options)>
RL <(Raftery-options)>

computes the Raftery-Lewis diagnostic, which evaluates the accuracy of the estimated quantile (ModifyingAbove theta With caret Subscript upper Q for a given Q element-of left-parenthesis 0 comma 1 right-parenthesis) of a chain. ModifyingAbove theta With caret Subscript upper Q can achieve any degree of accuracy when the chain is allowed to run for a long time. The algorithm stops when the estimated probability ModifyingAbove upper P With caret Subscript upper Q Baseline equals normal upper P normal r left-parenthesis theta less-than-or-equal-to ModifyingAbove theta With caret Subscript upper Q Baseline right-parenthesis reaches within plus-or-minus upper R of the value Q with probability S; that is, normal upper P normal r left-parenthesis upper Q minus upper R less-than-or-equal-to ModifyingAbove upper P With caret Subscript upper Q Baseline less-than-or-equal-to upper Q plus upper R right-parenthesis equals upper S.

You can specify Q, R, S, and a precision level epsilon for a stationarity test by specifying the following Raftery-options—for example, DIAGNOSTICS=RAFTERY(QUANTILE=0.05):

ACCURACY=value
R=value

specifies a small positive number as the margin of error for measuring the accuracy of estimation of the quantile. By default, ACCURACY=0.005.

EPS=value

specifies the tolerance level (a small positive number) for the stationarity test. By default, EPS=0.001.

PROB=value
S=value

specifies the probability of attaining the accuracy of the estimation of the quantile. By default, PROB=0.95.

QUANTILE=value
Q=value

specifies the order (a value between 0 and 1) of the quantile of interest. By default, QUANTILE=0.025.

DIC<(INCLUDE=variable)>

computes the deviance information criterion (DIC). DIC is calculated by using the posterior mean estimates of the parameters.

The INCLUDE=variable specification selects observations that should be included for computing DIC. The variable is binary with values being either 0 or 1. The observations for which variable=1 are included to obtain DIC, whereas those observations for which variable=0 are not included.

LOGPOST

computes the logarithm of the posterior density of the parameters and the likelihood at each iteration. The LogLike and LogPost variables are saved in the OUTPOST= data set.

MAXRESUBPRT<=number>

limits the display of subject levels in the "Random Effect Information" table. If you specify a number, the values of the subject effect are displayed up to that many characters. This option may help reduce the size of the "Subject Values" column in the "Random Effect Information" table if the subject of a RANDOM statement has a long list of levels. By default, MAXRESUBPRT = 200.

MISSING=keyword
MISS=keyword

specifies how to handle missing values. For more information, see the section Missing Data. PROC BGLIMM models missing response variables and discards observations that have missing covariates. You can specify the following keywords:

CC
COMPLETECASE

assumes a complete case analysis, so all observations that have missing variable values are discarded before the simulation.

CCMODELY

models the missing response variables and discards observations that have missing covariates.

By default, MISSING=CCMODELY.

NBI=number

specifies the number of burn-in iterations to perform before saving parameter estimate chains. By default, NBI=500.

NMC=number

specifies the number of iterations in the main simulation loop. If you specify a data set in the OUTPOST= option, number is the number of posterior samples that are saved for each parameter. This is the MCMC sample size if THIN=1. By default, NMC=5000.

NOCLPRINT<=number>

suppresses the display of the "Class Level Information" table if you do not specify number. If you specify number, the values of the classification variables are displayed for only those variables whose list of levels is less than number characters. Specifying number helps reduce the size of the "Class Level Information" table if some classification variables have a large number of levels. By default, NOCLPRINT = 200.

NTHREADS=number
NTHREAD=number

specifies the number of threads (CPUs) on which to run analytic computations and simulations simultaneously. Multithreading is the use of more than one thread to perform computations concurrently. When multithreading is possible, you can realize substantial performance gains compared to the performance that you get from sequential (single-threaded) execution. The more threads there are, the faster the computation runs. But do not specify a number greater than the number of CPUs on the host where the analytic computations are performed.

PROC BGLIMM performs two types of threading. In sampling fixed-effects parameters, the procedure allocates data to different threads and accumulates values from each thread; in sampling of random-effects parameters, each thread generates a subset of these parameters simultaneously at each iteration. Most sampling algorithms are threaded. NTHREADS=–1 sets the number of available threads to the number of hyperthreaded cores available on the system. By default, NTHREADS=1.

OUTPOST=SAS-data-set

specifies an output data set to contain the posterior samples of all parameters and the iteration numbers. It contains the log of the posterior density (LOGPOST) and the log likelihood (LOGLIKE) if you specify the LOGPOST option. By default, no OUTPOST= data set is created.

PLOTS <(global-plot-option)> <= plot-requests <(option)>>

controls the display of diagnostic plots. You can request three types of plots: trace plots, autocorrelation function plots, and kernel density plots. By default, the plots are displayed in panels unless you specify the global-plot-option UNPACK. Also, when you specify more than one type of plot, the plots are grouped by parameter unless you specify the global-plot-option GROUPBY=TYPE. When you specify only one plot-request, you can omit the parentheses around it, as shown in the following example:

   plots=none
   plots(unpack)=trace
   plots=(trace density)

If ODS Graphics is enabled and you specify PLOTS=ALL, then PROC BGLIMM produces, for each parameter, a panel that contains the trace plot, the autocorrelation function plot, and the density plot. This is equivalent to specifying PLOTS=(TRACE AUTOCORR DENSITY).

You can specify the following global-plot-options:

FRINGE

adds a fringe plot to the horizontal axis of the density plot.

GROUPBY=PARAMETER | TYPE
GROUP=PARAMETER | TYPE

specifies how the plots are grouped when there is more than one type of plot. By default, GROUPBY=PARAMETER. You can specify the following values:

PARAMETER

groups the plots by parameter.

TYPE

groups the plots by type.

LAGS=number

specifies the number of autocorrelation lags to use in plotting the ACF graph. By default, LAGS=50.

SMOOTH

smooths the trace plot by using a fitted penalized B-spline curve (Eilers and Marx 1996).

UNPACKPANEL
UNPACK

unpacks all paneled plots so that each plot in a panel is displayed separately.

You can specify the following plot-requests:

ALL

requests all types of plots. PLOTS=ALL is equivalent to specifying PLOTS=(TRACE AUTOCORR DENSITY).

AUTOCORR
ACF

displays the autocorrelation function plots for the parameters.

DENSITY
D
KERNEL
K

displays the kernel density plots for the parameters.

NONE

suppresses the display of all plots.

TRACE
T

displays the trace plots for the parameters.

Consider a model that has four parameters, X1–X4. The following list shows which plots are produced for various option settings:

  • PLOTS=(TRACE AUTOCORR) displays the trace and autocorrelation plots for each parameter side by side, with two parameters per panel:

    Display 1 Trace(X1) Autocorr(X1)
    Trace(X2) Autocorr(X2)
    Display 2 Trace(X3) Autocorr(X3)
    Trace(X4) Autocorr(X4)

  • PLOTS(GROUPBY=TYPE)=(TRACE AUTOCORR) displays all the paneled trace plots, followed by panels of autocorrelation plots:

    Display 1 Trace(X1)
    Trace(X2)
    Display 2 Trace(X3)
    Trace(X4)
    Display 3 Autocorr(X1) Autocorr(X2)
    Autocorr(X3) Autocorr(X4)

  • PLOTS(UNPACK)=(TRACE AUTOCORR) displays a separate trace plot and a separate correlation plot, parameter by parameter:

    Display 1 Trace(X1)
    Display 2 Autocorr(X1)
    Display 3 Trace(X2)
    Display 4 Autocorr(X2)
    Display 5 Trace(X3)
    Display 6 Autocorr(X3)
    Display 7 Trace(X4)
    Display 8 Autocorr(X4)

  • PLOTS(UNPACK GROUPBY=TYPE)=(TRACE AUTOCORR) displays all the separate trace plots, followed by the separate autocorrelation plots:

    Display 1 Trace(X1)
    Display 2 Trace(X2)
    Display 3 Trace(X3)
    Display 4 Trace(X4)
    Display 5 Autocorr(X1)
    Display 6 Autocorr(X2)
    Display 7 Autocorr(X3)
    Display 8 Autocorr(X4)

SAMEBYSEED

uses the same seed that you specify in the SEED= option to start the pseudorandom number generator in each BY group. If you omit this option, the initial seed for the next BY group is the one that is generated at the end of the previous BY group.

SEED=number

specifies an integer that is used to start the pseudorandom number generator. If you omit this option or if numberless-than-or-equal-to 0, the seed is generated from the time of day, which is read from the computer’s clock.

SINGCHOL=number

tunes the singularity criterion in Cholesky decomposition and matrix inversion operations. The default is 1E4 times the machine epsilon, or approximately 1E–12 on most computers.

SINGULAR=number

tunes the general singularity criterion applied by the procedure in divisions and inversions. The default is 1E4 times the machine epsilon, or approximately 1E–12 on most computers.

STATISTICS <(global-stats-options)> =  NONE | ALL | stats-request
STATS <(global-stats-options)> =  NONE | ALL | stats-request

specifies options for posterior statistics. By default, PROC BGLIMM computes the posterior mean, standard deviation, quantiles, and two 95% credible intervals: equal-tail and highest posterior density (HPD). Other available statistics include the posterior correlation and covariance. You can request all the posterior statistics by specifying STATS=ALL. You can suppress all the calculations by specifying STATS=NONE.

You can specify the following global-stats-options:

ALPHA=numeric-list

specifies the alpha level for the equal-tail and HPD intervals. The value of alpha must be between 0 and 0.5. By default, ALPHA=0.05.

PERCENT=numeric-list
PERCENTAGE=numeric-list

calculates the posterior percentages. The numeric-list contains values between 0 and 100, separated by spaces. By default, PERCENTAGE=(25 50 75).

You can specify the following stats-requests:

ALL

computes all posterior statistics. You can combine the ALL option with any other options. For example, STATS(ALPHA=(0.02 0.05 0.1))=ALL computes all statistics by using the default settings and intervals at alpha levels of 0.02, 0.05, and 0.1.

BRIEF

computes the posterior means, standard deviations, and 100 left-parenthesis 1 minus alpha right-parenthesis percent-sign HPD credible interval for each variable. By default, ALPHA=0.05, but you can use the global global-stats-option ALPHA= to specify other values. This is the default output for posterior statistics.

CORR

computes the posterior correlation matrix.

COV

computes the posterior covariance matrix.

INTERVAL
INT

computes the 100 left-parenthesis 1 minus alpha right-parenthesis percent-sign equal-tail and HPD credible intervals for each variable. By default, ALPHA=0.05, but you can use the global-stats-option ALPHA= to specify other intervals of any probabilities.

NONE

suppresses all the statistics.

SUMMARY
SUM

computes the posterior means, standard deviations, and percentile points for each variable. By default, the 25th, 50th, and 75th percentile points are produced, but you can use the global-stats-option PERCENT= to request specific percentile points.

THIN=number
NTHIN=number

controls the thinning rate of the simulation. PROC BGLIMM keeps every nth simulation sample and discards the rest. All posterior statistics and diagnostics are calculated by using the thinned samples. By default, THIN=1.

WAIC

computes the Watanabe-Akaike information criterion (WAIC), also known as the widely applicable information criterion. WAIC is proposed as an approximation to n-fold leave-one-out cross validation, which computes the posterior mean and variance of the likelihood and the log likelihood.

Last updated: December 09, 2022