The BGLIMM Procedure

Hamiltonian Monte Carlo Sampler

The Hamiltonian Monte Carlo (HMC) algorithm, also known as the hybrid Monte Carlo algorithm, is a version of the Metropolis algorithm that uses gradient information and auxiliary momentum variables to draw samples from the posterior distribution (Neal 2011). The algorithm uses Hamiltonian dynamics to enable distant proposals in the Metropolis algorithm, making it efficient in many scenarios. The HMC algorithm is applicable only to continuous parameters.

HMC translates the target density function to a potential energy function and adds an auxiliary momentum variable for each model parameter . The resulting joint density has the form

p left-parenthesis bold-italic theta comma bold r right-parenthesis proportional-to p left-parenthesis bold-italic theta right-parenthesis exp left-parenthesis minus one-half bold r prime bold r right-parenthesis

where is the posterior of the parameters (up to a normalizing constant). HMC draws from the joint space of , discards , and retains as samples from . The algorithm uses the idea of Hamiltonian dynamics in preserving the total energy of a physical system, in which is part of the potential energy function and is part of the kinetic energy (velocity). As the velocity changes, the potential energy changes accordingly, leading to the movements in the parameter space.

At each iteration, the HMC algorithm first generates the momentum variables , usually from standard normal distributions, that are independent of . Then the algorithm follows with a Metropolis update that includes many steps along a trajectory while maintaining the total energy of the system. One of the most common approaches in moving along this trajectory is the leapfrog method, which involves L steps with a step size ,

StartLayout 1st Row 1st Column bold r Superscript t plus epsilon slash 2 2nd Column equals 3rd Column bold r Superscript t Baseline plus left-parenthesis epsilon slash 2 right-parenthesis white down pointing triangle log p left-parenthesis bold-italic theta Superscript t Baseline right-parenthesis 2nd Row 1st Column bold-italic theta Superscript t plus epsilon 2nd Column equals 3rd Column bold-italic theta Superscript t Baseline plus epsilon bold r Superscript t plus epsilon slash 2 3rd Row 1st Column bold r Superscript t plus epsilon 2nd Column equals 3rd Column bold r Superscript t plus epsilon slash 2 Baseline plus left-parenthesis epsilon slash 2 right-parenthesis white down pointing triangle log p left-parenthesis bold-italic theta Superscript t plus epsilon Baseline right-parenthesis EndLayout

where is the gradient of the log posterior with respect to . After L steps, the proposed state is accepted as the next state of the Markov chain with probability .

Although HMC can lead to rapid convergence, it also heavily relies on two requirements: the gradient calculation of the logarithm of the posterior density and carefully selected tuning parameters, in step size and number of steps L. Step sizes that are too large or too small can lead to acceptance rates that are too low or too high, both of which affect the convergence of the Markov chain. A large L leads to a large trajectory length (), which can move the parameters back to their original positions. A small L limits the movement of the chain.

An example of an adaptive HMC algorithm with automatic tuning of and L is the No-U-Turn Sampler (NUTS; Hoffman and Gelman 2014). The NUTS algorithm uses a doubling process to build a binary tree whose leaf nodes correspond to the states of the parameters and momentum variables. The initial tree has a single node with no heights (). The doubling process expands the tree to either the left or right in a binary fashion, and in each direction, the algorithm takes leapfrog steps of size . Obviously, as the height of the tree (j) increases, the computational cost increases dramatically. The tree expands until one sampling trajectory makes a U-turn and starts to revisit parameter space that has been already explored. The NUTS algorithm tunes so that the actual acceptance rate during the doubling process is close to a predetermined target acceptance probability (usually set to 0.6 or higher). When the tuning stage ends, the NUTS algorithm proceeds to the main sampling stage and starts to draw posterior samples that have a fixed value. Increasing the targeted acceptance probability can often improve mixing, but it can also slow down the process significantly. For more information about the NUTS algorithm and its efficiency, see Hoffman and Gelman (2014).

Last updated: March 08, 2022