The MCMC Procedure

Blocking of Parameters

In a multivariate parameter model, if all k parameters are proposed with one joint distribution q left-parenthesis dot vertical-bar dot right-parenthesis, acceptance or rejection would occur for all of them. This can be rather inefficient, especially when parameters have vastly different scales. A way to avoid this difficulty is to allocate the k parameters into d blocks and update them separately. The PARMS statement puts model parameters in separate blocks, and each block of parameters is updated sequentially in the procedure.

Suppose you want to sample from a multivariate distribution with probability density function p left-parenthesis theta vertical-bar bold y right-parenthesis where theta equals StartSet theta 1 comma theta 2 comma ellipsis comma theta Subscript k Baseline EndSet period Now suppose that these k parameters are separated into d blocks—for example, p left-parenthesis theta vertical-bar bold x right-parenthesis equals f Subscript d Baseline left-parenthesis z right-parenthesis where z equals StartSet z 1 comma z 2 comma ellipsis comma z Subscript d Baseline EndSet, where each z Subscript j contains a nonempty subset of the StartSet theta Subscript i Baseline EndSet, and where each theta Subscript i is contained in one and only one z Subscript j. In the MCMC context, the z’s are blocks of parameters. In the blocked algorithm, a proposal consists of several parts. Instead of proposing a simultaneous move for all the theta’s, a proposal is made for the theta Subscript i’s in z 1 only, then for the theta Subscript i’s in z 2, and so on for d subproposals. Any accepted proposal can involve any number of the blocks moving. The parameters do not necessarily all move at once as in the all-at-once Metropolis algorithm.

Formally, the blocked Metropolis algorithm is as follows. Let w Subscript j be the collection of theta Subscript i that are in block z Subscript j, and let q Subscript j Baseline left-parenthesis dot vertical-bar w Subscript j Baseline right-parenthesis be a symmetric multivariate distribution that is centered at the current values of w Subscript j.

  1. Let t equals 0. Choose points for all w Subscript j Superscript t. A point can be an arbitrary point as long as p left-parenthesis w Subscript j Superscript t Baseline vertical-bar bold y right-parenthesis greater-than 0.

  2. For j equals 1 comma ellipsis comma d:

    1. Generate a new sample, w Subscript j comma new, using the proposal distribution q Subscript j Baseline left-parenthesis dot vertical-bar w Subscript j Superscript t Baseline right-parenthesis.

    2. Calculate the following quantity:

      r equals min left-brace StartFraction p left-parenthesis w Subscript j comma new Baseline vertical-bar w 1 Superscript t Baseline comma ellipsis comma w Subscript j minus 1 Superscript t Baseline comma w Subscript j plus 1 Superscript t minus 1 Baseline comma ellipsis comma w Subscript d Superscript t minus 1 Baseline comma bold y right-parenthesis Over p left-parenthesis w Subscript j Superscript t Baseline vertical-bar w 1 Superscript t Baseline comma ellipsis comma w Subscript j minus 1 Superscript t Baseline comma w Subscript j plus 1 Superscript t minus 1 Baseline comma ellipsis comma w Subscript d Superscript t minus 1 Baseline comma bold y right-parenthesis EndFraction comma 1 right-brace period
    3. Sample u from the uniform distribution upper U left-parenthesis 0 comma 1 right-parenthesis.

    4. Set w Subscript j Superscript t plus 1 Baseline equals w Subscript j comma new if r less-than u; w Subscript j Superscript t plus 1 Baseline equals w Subscript j Superscript t otherwise.

  3. Set t equals t plus 1. If t less-than upper T, the number of desired samples, go back to Step 2; otherwise, stop.

With PROC MCMC, you can sample all parameters simultaneously by putting them all in a single PARMS statement, you can sample parameters individually by putting each parameter in its own PARMS statement, or you can sample certain subsets of parameters together by grouping each subset in its own PARMS statements. For example, if the model you are interested in has five parameters, alpha, beta, gamma, phi, sigma, the all-at-once strategy is as follows:

parms alpha beta gamma phi sigma;

The one-at-a-time strategy is as follows:

parms alpha;
parms beta;
parms gamma;
parms phi;
parms sigma;

A two-block strategy could be as follows:

parms alpha beta gamma;
parms phi sigma;

The exceptions to the previously described blocking strategies are parameters that are sampled directly (either from their full conditional or marginal distributions) and parameters that are array-based (with multivariate prior distributions). In these cases, the parameters are taken out of an existing block and are updated individually. You can use the sampling options in the PARMS statement to override the default behavior.

One of the greatest challenges in MCMC sampling is achieving good mixing of the chains—the chains should quickly traverse the support of the stationary distribution. A number of factors determine the behavior of a Metropolis sampler; blocking is one of them, so you want to be extremely careful when you choose a good design. Generally speaking, forming blocks of parameters has its advantages, but it is not true that the larger the block the faster the convergence.

When simultaneously sampling a large number of parameters, the algorithm might find it difficult to achieve good mixing. As the number of parameters gets large, it is much more likely to have (proposal) samples that fall well into the tails of the target distribution, producing too small a test ratio. As a result, few proposed values are accepted and convergence is slow. On the other hand, when the algorithm samples each parameter individually, the computational cost increases linearly. Each block of Metropolis parameters requires one additional pass through the data set, so a five-block updating strategy could take five times longer than a single-block updating strategy. In addition, there is a chance that the chain might mix far too slowly because the conditional distributions (of theta Subscript i given all other theta’s) might be very "narrow," as a result of posterior correlation among the parameters. When that happens, it takes a long time for the chain to fully explore that dimension alone. There are no theoretical results that can help determine an optimal "blocking" for an arbitrary parametric model. A rule followed in practice is to form small groups of correlated parameters that belong to the same context in the formulation of the model. The best mixing is usually obtained with a blocking strategy somewhere between the all-at-once and one-at-a-time strategies.

Last updated: December 09, 2022