The MCMC Procedure

Example 80.20 Using a Transformation to Improve Mixing

(View the complete code for this example.)

Proper transformations of parameters can often improve the mixing in PROC MCMC. You already saw this in Nonlinear Poisson Regression Models, which sampled using the log scale of parameters that priors that are strictly positive, such as the gamma priors. This example shows another useful transformation: the logit transformation on parameters that take a uniform prior on [0, 1].

The data set is taken from Sharples (1990). It is used in Chaloner and Brant (1988) and Chaloner (1994) to identify outliers in the data set in a two-level hierarchical model. Congdon (2003) also uses this data set to demonstrates the same technique. This example uses the data set to illustrate how mixing can be improved using transformation and does not address the question of outlier detection as in those papers. The following statements create the data set:

data inputdata;
   input nobs grp y @@;
   ind = _n_;
   datalines;
1 1 24.80 2 1 26.90 3 1 26.65
4 1 30.93 5 1 33.77 6 1 63.31
1 2 23.96 2 2 28.92 3 2 28.19
4 2 26.16 5 2 21.34 6 2 29.46
1 3 18.30 2 3 23.67 3 3 14.47
4 3 24.45 5 3 24.89 6 3 28.95
1 4 51.42 2 4 27.97 3 4 24.76
4 4 26.67 5 4 17.58 6 4 24.29
1 5 34.12 2 5 46.87 3 5 58.59
4 5 38.11 5 5 47.59 6 5 44.67
;

There are five groups (grp, j equals 1 comma ellipsis comma 5) with six observations (nobs, i equals 1 comma ellipsis comma 6) in each. The two-level hierarchical model is specified as follows:

StartLayout 1st Row 1st Column y Subscript i j 2nd Column tilde 3rd Column normal left-parenthesis theta Subscript j Baseline comma prec equals tau Subscript w Baseline right-parenthesis 2nd Row 1st Column theta Subscript j 2nd Column tilde 3rd Column normal left-parenthesis mu comma prec equals tau Subscript b Baseline right-parenthesis 3rd Row 1st Column mu 2nd Column tilde 3rd Column normal left-parenthesis 0 comma prec equals 1 e minus 6 right-parenthesis 4th Row 1st Column tau 2nd Column tilde 3rd Column gamma left-parenthesis 0.001 comma iscale equals 0.001 right-parenthesis 5th Row 1st Column p 2nd Column tilde 3rd Column uniform left-parenthesis 0 comma 1 right-parenthesis EndLayout

with the precision parameters related to each other in the following way:

StartLayout 1st Row 1st Column tau Subscript b 2nd Column equals 3rd Column tau slash p 2nd Row 1st Column tau Subscript w 2nd Column equals 3rd Column tau Subscript b Baseline minus tau EndLayout

The total number of parameters in this model is eight: theta 1 comma ellipsis comma theta 5 comma mu comma tau, and p.

The following statements fit the model:

ods graphics on;
proc mcmc data=inputdata nmc=50000 thin=10 ntu=2000
          outpost=m1 seed=17797 plot=trace;
   parms p;
   parms tau;
   parms mu;

   prior p ~ uniform(0,1);
   prior tau ~ gamma(shape=0.001,iscale=0.001);
   prior mu ~ normal(0,prec=0.00000001);
   beginnodata;
   taub = tau/p;
   tauw = taub-tau;
   endnodata;

   random theta ~ normal(mu, prec=taub) subject=grp monitor=(theta);
   model y ~ normal(theta,prec=tauw);
run;

The ODS SELECT statement displays the effective sample size table and the trace plots. The ODS GRAPHICS ON statement enables ODS Graphics. The PROC MCMC statement specifies the usual options for the procedure run and produces trace plots (PLOTS=TRACE). The three PARMS statements put three model parameters, p, tau, and mu, in three different blocks. The PRIOR statements specify the prior distributions, and the programming statements enclosed with the BEGINNODATA and ENDNODATA statements calculate the transformation to taub and tauw. The RANDOM statement specifies the random effect, its prior distribution, and the subject variable. The resulting trace plots are shown in Output 80.20.1, and the effective sample size table is shown in Output 80.20.2.

Output 80.20.1: Trace Plots

Trace Plots
External File:images/mcmcex20badg1.png
External File:images/mcmcex20badg2.png


Output 80.20.2: Bad Effective Sample Sizes

The MCMC Procedure

Effective Sample Sizes
Parameter ESS Autocorrelation
Time
Efficiency
p 76.3 65.5445 0.0153
tau 100.1 49.9391 0.0200
mu 4901.2 1.0202 0.9802
theta_1 4330.1 1.1547 0.8660
theta_2 4575.3 1.0928 0.9151
theta_3 1273.1 3.9274 0.2546
theta_4 5000.0 1.0000 1.0000
theta_5 621.8 8.0412 0.1244


The trace plots show that most parameters have relatively good mixing. Two exceptions appear to be p and tau. The trace plot of p shows a slow periodic movement. The tau parameter does not have good mixing either. When the values are close to zero, the chain stays there for periods of time. An inspection of the effective sample sizes table reveals the same conclusion: p and tau have much smaller ESSs than the rest of the parameters.

A scatter plot of the posterior samples of p and tau reveals why mixing is bad in these two dimensions. The following statements generate the scatter plot in Output 80.20.3:

title 'Scatter Plot of Parameters on Original Scales';

proc sgplot data=m1;
   yaxis label = 'p';
   xaxis label = 'tau' values=(0 to 0.4 by 0.1);
   scatter x = tau y = p;
run;

Output 80.20.3: Scatter Plot of tau versus p

Scatter Plot of τ versus


The two parameters clearly have a nonlinear relationship. It is not surprising that the Metropolis algorithm does not work well here. The algorithm is designed for cases where the parameters are linearly related with each other.

To improve on mixing, you can sample on the log of tau, instead of sampling on tau. The formulation is:

StartLayout 1st Row 1st Column tau 2nd Column tilde 3rd Column gamma left-parenthesis shape equals 0.001 comma iscale equals 0.001 right-parenthesis 2nd Row 1st Column log left-parenthesis tau right-parenthesis 2nd Column tilde 3rd Column egamma left-parenthesis shape equals 0.001 comma iscale equals 0.001 right-parenthesis EndLayout

See the section Standard Distributions for the definitions of the gamma and egamma distributions. In addition, you can sample on the logit of p. Note that

p tilde uniform left-parenthesis 0 comma 1 right-parenthesis

is equivalent to

sans-serif l sans-serif g sans-serif p equals logit left-parenthesis p right-parenthesis tilde logistic left-parenthesis 0 comma 1 right-parenthesis

The following statements fit the same model by using transformed parameters:

proc mcmc data=inputdata nmc=50000 thin=10 outpost=m2 seed=17
          monitor=(p tau mu) plot=trace;
   ods select ess tracepanel;
   parms ltau lgp mu ;

   prior ltau ~ egamma(shape=0.001,iscale=0.001);
   prior lgp ~ logistic(0,1);
   prior mu ~ normal(0,prec=0.00000001);

   beginnodata;
   tau = exp(ltau);
   p = logistic(lgp);
   taub = tau/p;
   tauw = taub-tau;
   endnodata;

   random theta ~ normal(mu, prec=taub) subject=grp monitor=(theta);
   model y ~ normal(theta,prec=tauw);
run;

The variable lgp is the logit transformation of p, and ltau is the log transformation of tau. The prior for ltau is egamma, and the prior for lgp is logistic. The TAU and P assignment statements transform the parameters back to their original scales. The rest of the programs remain unchanged. Trace plots (Output 80.20.4) and effective sample size (Output 80.20.5) both show significant improvements in the mixing for both p and tau.

Output 80.20.4: Trace Plots after Transformation

Trace Plots after Transformation
External File:images/mcmcex20trang1.png
External File:images/mcmcex20trang2.png


Output 80.20.5: Effective Sample Sizes after Transformation

The MCMC Procedure

Effective Sample Sizes
Parameter ESS Autocorrelation
Time
Efficiency
p 3078.7 1.6241 0.6157
tau 2798.9 1.7864 0.5598
mu 5000.0 1.0000 1.0000
theta_1 5127.1 0.9752 1.0254
theta_2 4798.1 1.0421 0.9596
theta_3 4741.2 1.0546 0.9482
theta_4 5196.7 0.9621 1.0393
theta_5 4282.7 1.1675 0.8565


The following statements generate Output 80.20.6 and Output 80.20.7:

title 'Scatter Plot of Parameters on Transformed Scales';

proc sgplot data=m2;
   yaxis label = 'logit(p)';
   xaxis label = 'log(tau)';
   scatter x = ltau y = lgp;
run;

title 'Scatter Plot of Parameters on Original Scales';

proc sgplot data=m2;
   yaxis label = 'p';
   xaxis label = 'tau' values=(0 to 5.0 by 1);
   scatter x = tau y = p;
run;
ods graphics off;

Output 80.20.6: Scatter Plot of log left-parenthesis tau right-parenthesis versus logit left-parenthesis p right-parenthesis, After Transformation

Scatter Plot of (τ) versus logit(p), After Transformation


Output 80.20.7: Scatter Plot of tau versus p, After Transformation

Scatter Plot of τ versus , After Transformation


The scatter plot of log left-parenthesis tau right-parenthesis versus logit left-parenthesis p right-parenthesis shows a linear relationship between the two transformed parameters, and this explains the improvement in mixing. In addition, the transformations also help the Markov chain better explore in the original parameter space. Output 80.20.7 shows a scatter plot of tau versus p. The plot is similar to Output 80.20.3. However, note that tau has a far longer tail in Output 80.20.7, extending all the way to 5 as opposed to 0.15 in Output 80.20.3. This means that the second Markov chain can explore this dimension of the parameter more efficiently, and as a result, you are able to draw more precise inference with an equal number of simulations.

Last updated: March 08, 2022