The BGLIMM Procedure

Hamiltonian Monte Carlo Sampler

The Hamiltonian Monte Carlo (HMC) algorithm, also known as the hybrid Monte Carlo algorithm, is a version of the Metropolis algorithm that uses gradient information and auxiliary momentum variables to draw samples from the posterior distribution (Neal 2011). The algorithm uses Hamiltonian dynamics to enable distant proposals in the Metropolis algorithm, making it efficient in many scenarios. The HMC algorithm is applicable only to continuous parameters.

HMC translates the target density function to a potential energy function and adds an auxiliary momentum variable bold r for each model parameter bold-italic theta. The resulting joint density has the form

p left-parenthesis bold-italic theta comma bold r right-parenthesis proportional-to p left-parenthesis bold-italic theta right-parenthesis exp left-parenthesis minus one-half bold r prime bold r right-parenthesis

where p left-parenthesis bold-italic theta right-parenthesis is the posterior of the parameters bold-italic theta (up to a normalizing constant). HMC draws from the joint space of left-parenthesis bold-italic theta comma bold r right-parenthesis, discards bold r, and retains bold-italic theta as samples from p left-parenthesis bold-italic theta right-parenthesis. The algorithm uses the idea of Hamiltonian dynamics in preserving the total energy of a physical system, in which bold-italic theta is part of the potential energy function and bold r is part of the kinetic energy (velocity). As the velocity changes, the potential energy changes accordingly, leading to the movements in the parameter space.

At each iteration, the HMC algorithm first generates the momentum variables bold r, usually from standard normal distributions, that are independent of bold-italic theta. Then the algorithm follows with a Metropolis update that includes many steps along a trajectory while maintaining the total energy of the system. One of the most common approaches in moving along this trajectory is the leapfrog method, which involves L steps with a step size epsilon,

StartLayout 1st Row 1st Column bold r Superscript t plus epsilon slash 2 2nd Column equals 3rd Column bold r Superscript t Baseline plus left-parenthesis epsilon slash 2 right-parenthesis white down pointing triangle log p left-parenthesis bold-italic theta Superscript t Baseline right-parenthesis 2nd Row 1st Column bold-italic theta Superscript t plus epsilon 2nd Column equals 3rd Column bold-italic theta Superscript t Baseline plus epsilon bold r Superscript t plus epsilon slash 2 3rd Row 1st Column bold r Superscript t plus epsilon 2nd Column equals 3rd Column bold r Superscript t plus epsilon slash 2 Baseline plus left-parenthesis epsilon slash 2 right-parenthesis white down pointing triangle log p left-parenthesis bold-italic theta Superscript t plus epsilon Baseline right-parenthesis EndLayout

where white down pointing triangle Subscript bold-italic theta Baseline log p left-parenthesis bold-italic theta right-parenthesis is the gradient of the log posterior with respect to bold-italic theta. After L steps, the proposed state left-parenthesis bold-italic theta Superscript asterisk Baseline comma bold r Superscript asterisk Baseline right-parenthesis is accepted as the next state of the Markov chain with probability min left-brace 1 comma p left-parenthesis bold-italic theta Superscript asterisk Baseline comma bold r Superscript asterisk Baseline right-parenthesis slash p left-parenthesis bold-italic theta comma bold r right-parenthesis right-brace.

Although HMC can lead to rapid convergence, it also heavily relies on two requirements: the gradient calculation of the logarithm of the posterior density and carefully selected tuning parameters, in step size epsilon and number of steps L. Step sizes that are too large or too small can lead to acceptance rates that are too low or too high, both of which affect the convergence of the Markov chain. A large L leads to a large trajectory length (epsilon dot upper L), which can move the parameters back to their original positions. A small L limits the movement of the chain.

An example of an adaptive HMC algorithm with automatic tuning of epsilon and L is the No-U-Turn Sampler (NUTS; Hoffman and Gelman 2014). The NUTS algorithm uses a doubling process to build a binary tree whose leaf nodes correspond to the states of the parameters and momentum variables. The initial tree has a single node with no heights (j equals 0). The doubling process expands the tree to either the left or right in a binary fashion, and in each direction, the algorithm takes 2 Superscript j leapfrog steps of size epsilon. Obviously, as the height of the tree (j) increases, the computational cost increases dramatically. The tree expands until one sampling trajectory makes a U-turn and starts to revisit parameter space that has been already explored. The NUTS algorithm tunes epsilon so that the actual acceptance rate during the doubling process is close to a predetermined target acceptance probability delta (usually set to 0.6 or higher). When the tuning stage ends, the NUTS algorithm proceeds to the main sampling stage and starts to draw posterior samples that have a fixed epsilon value. Increasing the targeted acceptance probability delta can often improve mixing, but it can also slow down the process significantly. For more information about the NUTS algorithm and its efficiency, see Hoffman and Gelman (2014).

Last updated: March 08, 2022