Introduction to Bayesian Analysis Procedures

Assessing Markov Chain Convergence

Simulation-based Bayesian inference requires using simulated draws to summarize the posterior distribution or calculate any relevant quantities of interest. You need to treat the simulation draws with care. There are usually two issues. First, you have to decide whether the Markov chain has reached its stationary, or the desired posterior, distribution. Second, you have to determine the number of iterations to keep after the Markov chain has reached stationarity. Convergence diagnostics help to resolve these issues. Note that many diagnostic tools are designed to verify a necessary but not sufficient condition for convergence. There are no conclusive tests that can tell you when the Markov chain has converged to its stationary distribution. You should proceed with caution. Also, note that you should check the convergence of all parameters, and not just those of interest, before proceeding to make any inference. With some models, certain parameters can appear to have very good convergence behavior, but that could be misleading due to the slow convergence of other parameters. If some of the parameters have bad mixing, you cannot get accurate posterior inference for parameters that appear to have good mixing. See Cowles and Carlin (1996) and Brooks and Roberts (1998) for discussions about convergence diagnostics.

Visual Analysis via Trace Plots

Trace plots of samples versus the simulation index can be very useful in assessing convergence. The trace tells you if the chain has not yet converged to its stationary distribution—that is, if it needs a longer burn-in period. A trace can also tell you whether the chain is mixing well. A chain might have reached stationarity if the distribution of points is not changing as the chain progresses. The aspects of stationarity that are most recognizable from a trace plot are a relatively constant mean and variance. A chain that mixes well traverses its posterior space rapidly, and it can jump from one remote region of the posterior to another in relatively few steps. Figure 1 through Figure 4 display some typical features that you might see in trace plots. The traces are for a parameter called gamma.

Figure 1: Essentially Perfect Trace for gamma

Essentially Perfect Trace for γ


Figure 1 displays a "perfect" trace plot. Note that the center of the chain appears to be around the value 3, with very small fluctuations. This indicates that the chain could have reached the right distribution. The chain is mixing well; it is exploring the distribution by traversing to areas where its density is very low. You can conclude that the mixing is quite good here.

Figure 2: Initial Samples of gamma Need to be Discarded

Initial Samples of γ Need to be Discarded


Figure 2 displays a trace plot for a chain that starts at a very remote initial value and makes its way to the targeting distribution. The first few hundred observations should be discarded. This chain appears to be mixing very well locally. It travels relatively quickly to the target distribution, reaching it in a few hundred iterations. If you have a chain that looks like this, you would want to increase the burn-in sample size. If you need to use this sample to make inferences, you would want to use only the samples toward the end of the chain.

Figure 3: Marginal Mixing for gamma

Marginal Mixing for γ


Figure 3 demonstrates marginal mixing. The chain is taking only small steps and does not traverse its distribution quickly. This type of trace plot is typically associated with high autocorrelation among the samples. To obtain a few thousand independent samples, you need to run the chain for much longer.

Figure 4: Bad Mixing, Nonconvergence of gamma

Bad Mixing, Nonconvergence of γ


The trace plot shown in Figure 4 depicts a chain with serious problems. It is mixing very slowly, and it offers no evidence of convergence. You would want to try to improve the mixing of this chain. For example, you might consider reparameterizing your model on the log scale. Run the Markov chain for a long time to see where it goes. This type of chain is entirely unsuitable for making parameter inferences.

Statistical Diagnostic Tests

The Bayesian procedures include several statistical diagnostic tests that can help you assess Markov chain convergence. For a detailed description of each of the diagnostic tests, see the following subsections. Table 1 provides a summary of the diagnostic tests and their interpretations.

Table 1: Convergence Diagnostic Tests Available in the Bayesian Procedures

Name Description Interpretation of the Test
Gelman-Rubin Uses parallel chains with dispersed initial values to test whether they all converge to the same target distribution. Failure could indicate the presence of a multi-mode posterior distribution (different chains converge to different local modes) or the need to run a longer chain (burn-in is yet to be completed). One-sided test based on a variance ratio test statistic. Large ModifyingAbove upper R With caret Subscript c values indicate rejection.
Geweke Tests whether the mean estimates have converged by comparing means from the early and latter part of the Markov chain. Two-sided test based on a z-score statistic. Large absolute z values indicate rejection.
Heidelberger-Welch (stationarity test) Tests whether the Markov chain is a covariance (or weakly) stationary process. Failure could indicate that a longer Markov chain is needed. One-sided test based on a Cramér–von Mises statistic. Small p-values indicate rejection.
Heidelberger-Welch (half-width test) Reports whether the sample size is adequate to meet the required accuracy for the mean estimate. Failure could indicate that a longer Markov chain is needed. If a relative half-width statistic is greater than a predetermined accuracy measure, this indicates rejection.
Raftery-Lewis Evaluates the accuracy of the estimated (desired) percentiles by reporting the number of samples needed to reach the desired accuracy of the percentiles. Failure could indicate that a longer Markov chain is needed. If the total samples needed are fewer than the Markov chain sample, this indicates rejection.
Autocorrelation Measures dependency among Markov chain samples. High correlations between long lags indicate poor mixing.
Effective sample size Relates to autocorrelation; measures mixing of the Markov chain. Large discrepancy between the effective sample size and the simulation sample size indicates poor mixing.


Gelman and Rubin Diagnostics

Gelman and Rubin diagnostics (Gelman and Rubin 1992; Brooks and Gelman 1997) are based on analyzing multiple simulated MCMC chains by comparing the variances within each chain and the variance between chains. Large deviation between these two variances indicates nonconvergence.

Define StartSet theta Superscript t Baseline EndSet, where t equals 1 comma ellipsis comma n, to be the collection of a single Markov chain output. The parameter theta Superscript t is the tth sample of the Markov chain. For notational simplicity, theta is assumed to be single dimensional in this section.

Suppose you have M parallel MCMC chains that were initialized from various parts of the target distribution. Each chain is of length n (after discarding the burn-in). For each theta Superscript t, the simulations are labeled as theta Subscript m Superscript t Baseline comma where t equals 1 comma ellipsis comma n and m equals 1 comma ellipsis comma upper M. The between-chain variance B and the within-chain variance W are calculated as

StartLayout 1st Row 1st Column upper B 2nd Column equals 3rd Column StartFraction n Over upper M minus 1 EndFraction sigma-summation Underscript m equals 1 Overscript upper M Endscripts left-parenthesis theta overbar Subscript m Superscript dot Baseline minus theta overbar Subscript dot Superscript dot Baseline right-parenthesis squared comma where theta overbar Subscript m Superscript dot Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts theta Subscript m Superscript t Baseline comma theta overbar Subscript dot Superscript dot Baseline equals StartFraction 1 Over upper M EndFraction sigma-summation Underscript m equals 1 Overscript upper M Endscripts theta overbar Subscript m Superscript dot Baseline 2nd Row 1st Column upper W 2nd Column equals 3rd Column StartFraction 1 Over upper M EndFraction sigma-summation Underscript m equals 1 Overscript upper M Endscripts s Subscript m Superscript 2 Baseline comma where s Subscript m Superscript 2 Baseline equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts left-parenthesis theta Subscript m Superscript t Baseline minus theta overbar Subscript m Superscript dot Baseline right-parenthesis squared EndLayout

The posterior marginal variance, normal v normal a normal r left-parenthesis theta vertical-bar bold y right-parenthesis, is a weighted average of W and B. The estimate of the variance is

ModifyingAbove upper V With caret equals StartFraction n minus 1 Over n EndFraction upper W plus StartFraction upper M plus 1 Over n upper M EndFraction upper B

If all M chains have reached the target distribution, this posterior variance estimate should be very close to the within-chain variance W. Therefore, you would expect to see the ratio ModifyingAbove upper V With caret slash upper W be close to 1. The square root of this ratio is referred to as the potential scale reduction factor (PSRF). A large PSRF indicates that the between-chain variance is substantially greater than the within-chain variance, so that longer simulation is needed. If the PSRF is close to 1, you can conclude that each of the M chains has stabilized, and they are likely to have reached the target distribution.

A refined version of PSRF is calculated, as suggested by Brooks and Gelman (1997), as

ModifyingAbove upper R With caret Subscript c Baseline equals StartRoot StartFraction ModifyingAbove d With caret plus 3 Over ModifyingAbove d With caret plus 1 EndFraction dot StartFraction ModifyingAbove upper V With caret Over upper W EndFraction EndRoot equals StartRoot StartFraction ModifyingAbove d With caret plus 3 Over ModifyingAbove d With caret plus 1 EndFraction left-parenthesis StartFraction n minus 1 Over n EndFraction plus StartFraction upper M plus 1 Over n upper M EndFraction StartFraction upper B Over upper W EndFraction right-parenthesis EndRoot

where

ModifyingAbove d With caret equals StartFraction 2 ModifyingAbove upper V With caret squared Over ModifyingAbove normal upper V normal a normal r With caret left-parenthesis ModifyingAbove upper V With caret right-parenthesis EndFraction

and

StartLayout 1st Row 1st Column ModifyingAbove Var With caret left-parenthesis ModifyingAbove upper V With caret right-parenthesis 2nd Column equals 3rd Column left-parenthesis StartFraction n minus 1 Over n EndFraction right-parenthesis squared StartFraction 1 Over upper M EndFraction ModifyingAbove Var With caret left-parenthesis s Subscript m Superscript 2 Baseline right-parenthesis plus left-parenthesis StartFraction upper M plus 1 Over n upper M EndFraction right-parenthesis squared StartFraction 2 Over upper M minus 1 EndFraction upper B squared 2nd Row 1st Column Blank 2nd Column Blank 3rd Column plus 2 StartFraction left-parenthesis upper M plus 1 right-parenthesis left-parenthesis n minus 1 right-parenthesis Over n squared upper M EndFraction StartFraction n Over upper M EndFraction left-parenthesis ModifyingAbove cov With caret left-parenthesis s Subscript m Superscript 2 Baseline comma left-parenthesis theta overbar Subscript m Superscript dot Baseline right-parenthesis squared right-parenthesis minus 2 theta overbar Subscript dot Superscript dot Baseline ModifyingAbove cov With caret left-parenthesis s Subscript m Superscript 2 Baseline comma theta overbar Subscript m Superscript dot Baseline right-parenthesis right-parenthesis EndLayout

All the Bayesian procedures also produce an upper 100 left-parenthesis 1 minus alpha slash 2 right-parenthesis percent-sign confidence limit of ModifyingAbove upper R With caret Subscript c. Gelman and Rubin (1992) showed that the ratio upper B slash upper W in ModifyingAbove upper R With caret Subscript c has an F distribution with degrees of freedom upper M minus 1 and 2 upper W squared upper M slash ModifyingAbove normal upper V normal a normal r With caret left-parenthesis s Subscript m Superscript 2 Baseline right-parenthesis. Because you are concerned only if the scale is large, not small, only the upper 100 left-parenthesis 1 minus alpha slash 2 right-parenthesis percent-sign confidence limit is reported. This is written as

StartRoot left-parenthesis StartFraction n minus 1 Over n EndFraction plus StartFraction upper M plus 1 Over n upper M EndFraction dot upper F Subscript 1 minus alpha slash 2 Baseline left-parenthesis upper M minus 1 comma StartFraction 2 upper W squared Over ModifyingAbove normal upper V normal a normal r With caret left-parenthesis s Subscript m Superscript 2 Baseline right-parenthesis slash upper M EndFraction right-parenthesis right-parenthesis dot StartFraction ModifyingAbove d With caret plus 3 Over ModifyingAbove d With caret plus 1 EndFraction EndRoot

In the Bayesian procedures, you can specify the number of chains that you want to run. Typically three chains are sufficient. The first chain is used for posterior inference, such as mean and standard deviation; the other upper M minus 1 chains are used for computing the diagnostics and are discarded afterward. This test can be computationally costly, because it prolongs the simulation M-fold.

It is best to choose different initial values for all M chains. The initial values should be as dispersed from each other as possible so that the Markov chains can fully explore different parts of the distribution before they converge to the target. Similar initial values can be risky because all of the chains can get stuck in a local maximum; that is something this convergence test cannot detect. If you do not supply initial values for all the different chains, the procedures generate them for you.

Geweke Diagnostics

The Geweke test (Geweke 1992) compares values in the early part of the Markov chain to those in the latter part of the chain in order to detect failure of convergence. The statistic is constructed as follows. Two subsequences of the Markov chain StartSet theta Superscript t Baseline EndSet are taken out, with StartSet theta 1 Superscript t Baseline colon t equals 1 comma ellipsis comma n 1 EndSet and StartSet theta 2 Superscript t Baseline colon t equals n Subscript a Baseline comma ellipsis comma n EndSet, where 1 less-than n 1 less-than n Subscript a Baseline less-than n. Let n 2 equals n minus n Subscript a Baseline plus 1, and define

theta overbar Subscript 1 Baseline equals StartFraction 1 Over n 1 EndFraction sigma-summation Underscript t equals 1 Overscript n 1 Endscripts theta Superscript t Baseline and theta overbar Subscript 2 Baseline equals StartFraction 1 Over n 2 EndFraction sigma-summation Underscript t equals n Subscript a Baseline Overscript n Endscripts theta Superscript t

Let ModifyingAbove s With caret Subscript 1 Baseline left-parenthesis 0 right-parenthesis and ModifyingAbove s With caret Subscript 2 Baseline left-parenthesis 0 right-parenthesis denote consistent spectral density estimates at zero frequency (see the subsection Spectral Density Estimate at Zero Frequency for estimation details) for the two MCMC chains, respectively. If the ratios n 1 slash n and n 2 slash n are fixed, left-parenthesis n 1 plus n 2 right-parenthesis slash n less-than 1, and the chain is stationary, then the following statistic converges to a standard normal distribution as n right-arrow normal infinity :

upper Z Subscript n Baseline equals StartFraction theta overbar Subscript 1 Baseline minus theta 2 overbar Over StartRoot StartFraction ModifyingAbove s With caret Subscript 1 Baseline left-parenthesis 0 right-parenthesis Over n 1 EndFraction plus StartFraction ModifyingAbove s With caret Subscript 2 Baseline left-parenthesis 0 right-parenthesis Over n 2 EndFraction EndRoot EndFraction

This is a two-sided test, and large absolute z-scores indicate rejection.

Spectral Density Estimate at Zero Frequency

For one sequence of the Markov chain StartSet theta Subscript t Baseline EndSet, the relationship between the h-lag covariance sequence of a time series and the spectral density, f, is

s Subscript h Baseline equals StartFraction 1 Over 2 pi EndFraction integral Subscript negative pi Superscript pi Baseline exp left-parenthesis sans-serif i omega h right-parenthesis f left-parenthesis omega right-parenthesis d omega

where i indicates that omega h is the complex argument. Inverting this Fourier integral,

f left-parenthesis omega right-parenthesis equals sigma-summation Underscript h equals negative normal infinity Overscript normal infinity Endscripts s Subscript h Baseline exp left-parenthesis minus sans-serif i omega h right-parenthesis equals s 0 left-parenthesis 1 plus 2 sigma-summation Underscript h equals 1 Overscript normal infinity Endscripts rho Subscript h Baseline cosine left-parenthesis omega h right-parenthesis right-parenthesis

It follows that

f left-parenthesis 0 right-parenthesis equals sigma squared left-parenthesis 1 plus 2 sigma-summation Underscript h equals 1 Overscript normal infinity Endscripts rho Subscript h Baseline right-parenthesis

which gives an autocorrelation adjusted estimate of the variance. In this equation, sigma squared is the naive variance estimate of the sequence StartSet theta Subscript t Baseline EndSet and rho Subscript h is the lag h autocorrelation. Due to obvious computational difficulties, such as calculation of autocorrelation at infinity, you cannot effectively estimate f left-parenthesis 0 right-parenthesis by using the preceding formula. The usual route is to first obtain the periodogram p left-parenthesis omega right-parenthesis of the sequence, and then estimate f left-parenthesis 0 right-parenthesis by smoothing the estimated periodogram. The periodogram is defined to be

p left-parenthesis omega right-parenthesis equals StartFraction 1 Over n EndFraction left-bracket left-parenthesis sigma-summation Underscript t equals 1 Overscript n Endscripts theta Subscript t Baseline sine left-parenthesis omega t right-parenthesis right-parenthesis squared plus left-parenthesis sigma-summation Underscript t equals 1 Overscript n Endscripts theta Subscript t Baseline cosine left-parenthesis omega t right-parenthesis right-parenthesis squared right-bracket

The procedures use the following way to estimate ModifyingAbove f With caret left-parenthesis 0 right-parenthesis from p (Heidelberger and Welch 1981). In p left-parenthesis omega right-parenthesis, let omega equals omega Subscript k Baseline equals 2 pi k slash n and k equals 1 comma ellipsis comma left-bracket StartFraction n Over 2 EndFraction right-bracket.[3] A smooth spectral density in the domain of left-parenthesis 0 comma pi right-bracket is obtained by fitting a gamma model with the log link function, using p left-parenthesis omega Subscript k Baseline right-parenthesis as response and x 1 left-parenthesis omega Subscript k Baseline right-parenthesis equals StartRoot 3 EndRoot left-parenthesis 4 omega Subscript k Baseline slash left-parenthesis 2 pi right-parenthesis minus 1 right-parenthesis as the only regressor. The predicted value ModifyingAbove f With caret left-parenthesis 0 right-parenthesis is given by

ModifyingAbove f With caret left-parenthesis 0 right-parenthesis equals exp left-parenthesis ModifyingAbove beta With caret Subscript 0 Baseline minus StartRoot 3 EndRoot ModifyingAbove beta With caret Subscript 1 Baseline right-parenthesis

where ModifyingAbove beta With caret Subscript 0 and ModifyingAbove beta With caret Subscript 1 are the estimates of the intercept and slope parameters, respectively.

Heidelberger and Welch Diagnostics

The Heidelberger and Welch test (Heidelberger and Welch 1981, 1983) consists of two parts: a stationary portion test and a half-width test. The stationarity test assesses the stationarity of a Markov chain by testing the hypothesis that the chain comes from a covariance stationary process. The half-width test checks whether the Markov chain sample size is adequate to estimate the mean values accurately.

Given StartSet theta Superscript t Baseline EndSet, set upper S 0 equals 0, upper S Subscript n Baseline equals sigma-summation Underscript t equals 1 Overscript n Endscripts theta Superscript t, and theta overbar equals left-parenthesis 1 slash n right-parenthesis sigma-summation Underscript t equals 1 Overscript n Endscripts theta Superscript t. You can construct the following sequence with s coordinates on values from StartFraction 1 Over n EndFraction comma StartFraction 2 Over n EndFraction comma ellipsis comma 1:

upper B Subscript n Baseline left-parenthesis s right-parenthesis equals left-parenthesis upper S Subscript left-bracket n s right-bracket Baseline minus left-bracket n s right-bracket theta overbar right-parenthesis slash left-parenthesis n ModifyingAbove p With caret left-parenthesis 0 right-parenthesis right-parenthesis Superscript 1 slash 2

where left-bracket right-bracket is the rounding operator, and ModifyingAbove p With caret left-parenthesis 0 right-parenthesis is an estimate of the spectral density at zero frequency that uses the second half of the sequence (see the section Spectral Density Estimate at Zero Frequency for estimation details). For large n, upper B Subscript n converges in distribution to a Brownian bridge (Billingsley 1986). So you can construct a test statistic by using upper B Subscript n. The statistic used in these procedures is the Cramer–von Mises statistic[4]; that is integral Subscript 0 Superscript 1 Baseline upper B Subscript n Baseline left-parenthesis s right-parenthesis squared d s equals normal upper C normal upper V normal upper M left-parenthesis upper B Subscript n Baseline right-parenthesis. As n right-arrow normal infinity, the statistic converges in distribution to a standard Cramer–von Mises distribution. The integral integral Subscript 0 Superscript 1 Baseline upper B Subscript n Baseline left-parenthesis s right-parenthesis squared d s is numerically approximated using Simpson’s rule.

Let y Subscript i Baseline equals upper B Subscript n Baseline left-parenthesis s right-parenthesis squared, where s equals 0 comma StartFraction 1 Over n EndFraction comma ellipsis comma StartFraction n minus 1 Over n EndFraction comma 1, and i equals n s equals 0 comma 1 comma ellipsis comma n. If n is even, let m equals n slash 2; otherwise, let m equals left-parenthesis n minus 1 right-parenthesis slash 2. The Simpson’s approximation to the integral is

integral Subscript 0 Superscript 1 Baseline upper B Subscript n Baseline left-parenthesis s right-parenthesis squared d s almost-equals StartFraction 1 Over 3 n EndFraction left-bracket y 0 plus 4 left-parenthesis y 1 plus midline-horizontal-ellipsis plus y Subscript 2 m minus 1 Baseline right-parenthesis plus 2 left-parenthesis y 2 plus midline-horizontal-ellipsis plus y Subscript 2 m minus 2 Baseline right-parenthesis plus y Subscript 2 m Baseline right-bracket

Note that Simpson’s rule requires an even number of intervals. When n is odd, y Subscript n is set to be 0 and the value does not contribute to the approximation.

This test can be performed repeatedly on the same chain, and it helps you identify a time t when the chain has reached stationarity. The whole chain, StartSet theta Superscript t Baseline EndSet, is first used to construct the Cramer–von Mises statistic. If it passes the test, you can conclude that the entire chain is stationary. If it fails the test, you drop the initial 10 percent-sign of the chain and redo the test by using the remaining 90 percent-sign. This process is repeated until either a time t is selected or it reaches a point where there are not enough data remaining to construct a confidence interval (the cutoff proportion is set to be 50 percent-sign).

The part of the chain that is deemed stationary is put through a half-width test, which reports whether the sample size is adequate to meet certain accuracy requirements for the mean estimates. Running the simulation less than this length of time would not meet the requirement, while running it longer would not provide any additional information that is needed. The statistic calculated here is the relative half-width (RHW) of the confidence interval. The RHW for a confidence interval of level 1 minus alpha is

normal upper R normal upper H normal upper W equals StartFraction z Subscript left-parenthesis 1 minus alpha slash 2 right-parenthesis Baseline dot left-parenthesis ModifyingAbove s With caret Subscript n Baseline slash n right-parenthesis Superscript 1 slash 2 Baseline Over ModifyingAbove theta With caret EndFraction

where z Subscript left-parenthesis 1 minus alpha slash 2 right-parenthesis is the z-score of the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile (for example, z Subscript left-parenthesis 1 minus alpha slash 2 right-parenthesis Baseline equals 1.96 if alpha equals 0.05), ModifyingAbove s With caret Subscript n is the variance of the chain estimated using the spectral density method (see explanation in the section Spectral Density Estimate at Zero Frequency), n is the length, and ModifyingAbove theta With caret is the estimated mean. The RHW quantifies accuracy of the 1 minus alpha level confidence interval of the mean estimate by measuring the ratio between the sample standard error of the mean and the mean itself. In other words, you can stop the Markov chain if the variability of the mean stabilizes with respect to the mean. An implicit assumption is that large means are often accompanied by large variances. If this assumption is not met, then this test can produce false rejections (such as a small mean around 0 and large standard deviation) or false acceptance (such as a very large mean with relative small variance). As with any other convergence diagnostics, you might want to exercise caution in interpreting the results.

The stationarity test is one-sided; rejection occurs when the p-value is greater than 1 minus alpha. To perform the half-width test, you need to select an alpha level (the default of which is 0.05) and a predetermined tolerance value epsilon (the default of which is 0.1). If the calculated RHW is greater than epsilon, you conclude that there are not enough data to accurately estimate the mean with 1 minus alpha confidence under tolerance of epsilon.

Raftery and Lewis Diagnostics

If your interest lies in posterior percentiles, you want a diagnostic test that evaluates the accuracy of the estimated percentiles. The Raftery-Lewis test (Raftery and Lewis 1992, 1995) is designed for this purpose. Notation and deductions here closely resemble those in Raftery and Lewis (1995).

Suppose you are interested in a quantity theta Subscript q such that upper P left-parenthesis theta less-than-or-equal-to theta Subscript q Baseline vertical-bar bold y right-parenthesis equals q, where q can be an arbitrary cumulative probability, such as 0.025. This theta Subscript q can be empirically estimated by finding the 100nqth number of the sorted StartSet theta Superscript t Baseline EndSet. Let ModifyingAbove theta With caret Subscript q denote the estimand, which corresponds to an estimated probability upper P left-parenthesis theta less-than-or-equal-to ModifyingAbove theta With caret Subscript q Baseline right-parenthesis equals ModifyingAbove upper P With caret Subscript q. Because the simulated posterior distribution converges to the true distribution as the simulation sample size grows, ModifyingAbove theta With caret Subscript q can achieve any degree of accuracy if the simulator is run for a very long time. However, running too long a simulation can be wasteful. Alternatively, you can use coverage probability to measure accuracy and stop the chain when a certain accuracy is reached.

A stopping criterion is reached when the estimated probability is within plus-or-minus r of the true cumulative probability q, with probability s, such as upper P left-parenthesis ModifyingAbove upper P With caret Subscript q Baseline element-of left-parenthesis q minus r comma q plus r right-parenthesis right-parenthesis equals s. For example, suppose you want the coverage probability s to be 0.95 and the amount of tolerance r to be 0.005. This corresponds to requiring that the estimate of the cumulative distribution function of the 2.5th percentile be estimated to within plus-or-minus 0.5 percentage points with probability 0.95.

The Raftery-Lewis diagnostics test finds the number of iterations, M, that need to be discarded (burn-ins) and the number of iterations needed, N, to achieve a desired precision. Given a predefined cumulative probability q, these procedures first find ModifyingAbove theta With caret Subscript q, and then they construct a binary 0 minus 1 process StartSet upper Z Subscript t Baseline EndSet by setting upper Z Subscript t Baseline equals 1 if theta Superscript t Baseline less-than-or-equal-to ModifyingAbove theta With caret Subscript q and 0 otherwise for all t. The sequence StartSet upper Z Subscript t Baseline EndSet is itself not a Markov chain, but you can construct a subsequence of StartSet upper Z Subscript t Baseline EndSet that is approximately Markovian if it is sufficiently k-thinned. When k becomes reasonably large, StartSet upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline EndSet starts to behave like a Markov chain.

Next, the procedures find this thinning parameter k. The number k is estimated by comparing the Bayesian information criterion (BIC) between two Markov models: a first-order and a second-order Markov model. A jth-order Markov model is one in which the current value of StartSet upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline EndSet depends on the previous j values. For example, in a second-order Markov model,

StartLayout 1st Row 1st Column p left-parenthesis upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline equals z Subscript t Baseline vertical-bar upper Z Subscript t minus 1 Superscript left-parenthesis k right-parenthesis Baseline equals z Subscript t minus 1 Baseline comma upper Z Subscript t minus 2 Superscript left-parenthesis k right-parenthesis Baseline equals z Subscript t minus 2 Baseline comma ellipsis comma upper Z 0 Superscript left-parenthesis k right-parenthesis Baseline equals z 0 right-parenthesis 2nd Row 1st Column Blank 2nd Column equals 3rd Column p left-parenthesis upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline equals z Subscript t Baseline vertical-bar upper Z Subscript t minus 1 Superscript left-parenthesis k right-parenthesis Baseline equals z Subscript t minus 1 Baseline comma upper Z Subscript t minus 2 Superscript left-parenthesis k right-parenthesis Baseline equals z Subscript t minus 2 Baseline right-parenthesis EndLayout

where z Subscript i Baseline equals StartSet 0 comma 1 EndSet comma i equals 0 comma ellipsis comma t. Given StartSet upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline EndSet, you can construct two transition count matrices for a second-order Markov model:

z Subscript t Baseline equals 0 z Subscript t Baseline equals 1
z Subscript t minus 1 Baseline equals 0 z Subscript t minus 1 Baseline equals 1 z Subscript t minus 1 Baseline equals 0 z Subscript t minus 1 Baseline equals 1
z Subscript t minus 2 Baseline equals 0 w 000 w 010 z Subscript t minus 2 Baseline equals 0 w 001 w 011
z Subscript t minus 2 Baseline equals 1 w 100 w 110 z Subscript t minus 2 Baseline equals 1 w 101 w 111

For each k, the procedures calculate the BIC that compares the two Markov models. The BIC is based on a likelihood ratio test statistic that is defined as

upper G Subscript k Superscript 2 Baseline equals 2 sigma-summation Underscript i equals 0 Overscript 1 Endscripts sigma-summation Underscript j equals 0 Overscript 1 Endscripts sigma-summation Underscript l equals 0 Overscript 1 Endscripts w Subscript i j l Baseline log StartFraction w Subscript i j l Baseline Over ModifyingAbove w With caret Subscript i j l Baseline EndFraction

where ModifyingAbove w With caret Subscript i j l is the expected cell count of w Subscript i j l under the null model, the first-order Markov model, where the assumption left-parenthesis upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline up-tack upper Z Subscript t minus 2 Superscript left-parenthesis k right-parenthesis Baseline right-parenthesis vertical-bar upper Z Subscript t minus 1 Superscript left-parenthesis k right-parenthesis holds. The formula for the expected cell count is

ModifyingAbove w With caret Subscript i j l Baseline equals StartFraction sigma-summation Underscript i Endscripts w Subscript i j l Baseline dot sigma-summation Underscript l Endscripts w Subscript i j l Baseline Over sigma-summation Underscript i Endscripts sigma-summation Underscript l Endscripts w Subscript i j l Baseline EndFraction

The BIC is upper G Subscript k Superscript 2 Baseline minus 2 log left-parenthesis n Subscript k Baseline minus 2 right-parenthesis, where n Subscript k is the k-thinned sample size (every kth sample starting with the first), with the last two data points discarded due to the construction of the second-order Markov model. The thinning parameter k is the smallest k for which the BIC is negative. When k is found, you can estimate a transition probability matrix between state 0 and state 1 for StartSet upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline EndSet:

upper Q equals Start 2 By 2 Matrix 1st Row 1st Column 1 minus alpha 2nd Column alpha 2nd Row 1st Column beta 2nd Column 1 minus beta EndMatrix

Because StartSet upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline EndSet is a Markov chain, its equilibrium distribution exists and is estimated by

pi equals left-parenthesis pi 0 comma pi 1 right-parenthesis equals StartFraction left-parenthesis beta comma alpha right-parenthesis Over alpha plus beta EndFraction

where pi 0 equals upper P left-parenthesis theta less-than-or-equal-to theta Subscript q Baseline vertical-bar bold y right-parenthesis and pi 1 equals 1 minus pi 0. The goal is to find an iteration number m such that after m steps, the estimated transition probability upper P left-parenthesis upper Z Subscript m Superscript left-parenthesis k right-parenthesis Baseline equals i vertical-bar upper Z 0 Superscript left-parenthesis k right-parenthesis Baseline equals j right-parenthesis is within epsilon of equilibrium pi Subscript i for i comma j equals 0 comma 1. Let e 0 equals left-parenthesis 1 comma 0 right-parenthesis and e 1 equals 1 minus e 0. The estimated transition probability after step m is

upper P left-parenthesis upper Z Subscript m Superscript left-parenthesis k right-parenthesis Baseline equals i vertical-bar upper Z 0 Superscript left-parenthesis k right-parenthesis Baseline equals j right-parenthesis equals e Subscript j Baseline left-bracket Start 2 By 2 Matrix 1st Row 1st Column pi 0 2nd Column pi 1 2nd Row 1st Column pi 0 2nd Column pi 1 EndMatrix plus StartFraction left-parenthesis 1 minus alpha minus beta right-parenthesis Superscript m Baseline Over alpha plus beta EndFraction Start 2 By 2 Matrix 1st Row 1st Column alpha 2nd Column negative alpha 2nd Row 1st Column negative beta 2nd Column beta EndMatrix right-bracket e prime Subscript j

which holds when

m equals StartStartFraction log left-parenthesis StartFraction left-parenthesis alpha plus beta right-parenthesis epsilon Over max left-parenthesis alpha comma beta right-parenthesis EndFraction right-parenthesis OverOver log left-parenthesis 1 minus alpha minus beta right-parenthesis EndEndFraction

assuming 1 minus alpha minus beta greater-than 0.

Therefore, by time m, StartSet upper Z Subscript t Superscript left-parenthesis k right-parenthesis Baseline EndSet is sufficiently close to its equilibrium distribution, and you know that a total size of upper M equals m k should be discarded as the burn-in.

Next, the procedures estimate N, the number of simulations needed to achieve desired accuracy on percentile estimation. The estimate of upper P left-parenthesis theta less-than-or-equal-to theta Subscript q Baseline vertical-bar bold y right-parenthesis is upper Z overbar Subscript n Superscript left-parenthesis k right-parenthesis Baseline equals StartFraction 1 Over n EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts upper Z Subscript t Superscript left-parenthesis k right-parenthesis. For large n, upper Z overbar Subscript n Superscript left-parenthesis k right-parenthesis is normally distributed with mean q, the true cumulative probability, and variance

StartFraction 1 Over n EndFraction StartFraction left-parenthesis 2 minus alpha minus beta right-parenthesis alpha beta Over left-parenthesis alpha plus beta right-parenthesis cubed EndFraction

upper P left-parenthesis q minus r less-than-or-equal-to upper Z overbar Subscript n Superscript left-parenthesis k right-parenthesis Baseline less-than-or-equal-to q plus r right-parenthesis equals s is satisfied if

n equals StartFraction left-parenthesis 2 minus alpha minus beta right-parenthesis alpha beta Over left-parenthesis alpha plus beta right-parenthesis cubed EndFraction StartSet StartStartFraction normal upper Phi Superscript negative 1 Baseline left-parenthesis StartFraction s plus 1 Over 2 EndFraction right-parenthesis OverOver r EndEndFraction EndSet squared

Therefore, upper N equals n k.

By using similar reasoning, the procedures first calculate the minimal number of iterations needed to achieve the desired accuracy, assuming the samples are independent:

upper N Subscript m i n Baseline equals StartSet normal upper Phi Superscript negative 1 Baseline left-parenthesis StartFraction s plus 1 Over 2 EndFraction right-parenthesis EndSet squared StartFraction q left-parenthesis 1 minus q right-parenthesis Over r squared EndFraction

If StartSet theta Superscript t Baseline EndSet does not have that required sample size, the Raftery-Lewis test is not carried out. If you still want to carry out the test, increase the number of Markov chain iterations.

The ratio upper N slash upper N Subscript m i n is sometimes referred to as the dependence factor. It measures deviation from posterior sample independence: the closer it is to 1, the less correlated are the samples. There are a few things to keep in mind when you use this test. This diagnostic tool is specifically designed for the percentile of interest and does not provide information about convergence of the chain as a whole (Brooks and Roberts 1999). In addition, the test can be very sensitive to small changes. Both N and upper N Subscript m i n are inversely proportional to r squared, so you can expect to see large variations in these numbers with small changes to input variables, such as the desired coverage probability or the cumulative probability of interest. Last, the time until convergence for a parameter can differ substantially for different cumulative probabilities.

Autocorrelations

The sample autocorrelation of lag h for a parameter theta is defined in terms of the sample autocovariance function:

ModifyingAbove rho With caret Subscript h Baseline left-parenthesis theta right-parenthesis equals StartFraction ModifyingAbove gamma With caret Subscript h Baseline left-parenthesis theta right-parenthesis Over ModifyingAbove gamma With caret Subscript 0 Baseline left-parenthesis theta right-parenthesis EndFraction comma StartAbsoluteValue h EndAbsoluteValue less-than n

The sample autocovariance function of lag h of theta is defined by

ModifyingAbove gamma With caret Subscript h Baseline left-parenthesis theta right-parenthesis equals StartFraction 1 Over n minus h EndFraction sigma-summation Underscript t equals 1 Overscript n minus h Endscripts left-parenthesis theta Superscript t plus h Baseline minus theta overbar right-parenthesis left-parenthesis theta Superscript t Baseline minus theta overbar right-parenthesis comma 0 less-than-or-equal-to h less-than n
Effective Sample Size

You can use autocorrelation and trace plots to examine the mixing of a Markov chain. A closely related measure of mixing is the effective sample size (ESS) (Kass et al. 1998).

ESS is defined as follows:

normal upper E normal upper S normal upper S equals StartFraction n Over tau EndFraction equals StartFraction n Over 1 plus 2 sigma-summation Underscript k equals 1 Overscript normal infinity Endscripts rho Subscript k Baseline left-parenthesis theta right-parenthesis EndFraction

where n is the total sample size and rho Subscript k Baseline left-parenthesis theta right-parenthesis is the autocorrelation of lag k for theta. The quantity tau is referred to as the autocorrelation time. To estimate tau, the Bayesian procedures first find a cutoff point k after which the autocorrelations are very close to zero, and then sum all the rho Subscript k up to that point. The cutoff point k is such that StartAbsoluteValue rho Subscript k Baseline EndAbsoluteValue less-than min StartSet 0.01 comma 2 s Subscript k Baseline EndSet, where s Subscript k is the estimated standard deviation:

s Subscript k Baseline equals StartRoot left-parenthesis StartFraction 1 Over n EndFraction left-parenthesis 1 plus 2 sigma-summation Underscript j equals 1 Overscript k minus 1 Endscripts ModifyingAbove rho With caret Subscript j Superscript 2 Baseline left-parenthesis theta right-parenthesis right-parenthesis right-parenthesis EndRoot

ESS and tau are inversely proportional to each other, and low ESS or high tau indicates bad mixing of the Markov chain.



[3] This is equivalent to the fast Fourier transformation of the original time series theta Subscript t.

[4] The von Mises distribution was first introduced by Von Mises (1918). The density function is p left-parenthesis theta vertical-bar mu kappa right-parenthesis tilde upper M left-parenthesis mu comma kappa right-parenthesis equals left-bracket 2 pi upper I 0 left-parenthesis kappa right-parenthesis right-bracket Superscript negative 1 Baseline exp left-parenthesis kappa cosine left-parenthesis theta minus mu right-parenthesis right-parenthesis left-parenthesis 0 less-than-or-equal-to theta less-than-or-equal-to 2 pi right-parenthesis, where the function upper I 0 left-parenthesis kappa right-parenthesis is the modified Bessel function of the first kind and order zero, defined by upper I 0 left-parenthesis kappa right-parenthesis equals left-parenthesis 2 pi right-parenthesis Superscript negative 1 Baseline integral Subscript 0 Superscript 2 pi Baseline exp left-parenthesis kappa cosine left-parenthesis theta minus mu right-parenthesis right-parenthesis d theta.

Last updated: March 08, 2022