The Hamiltonian Monte Carlo (HMC) algorithm, also known as the hybrid Monte Carlo algorithm, is a version of the Metropolis algorithm that uses gradient information and auxiliary momentum variables to draw samples from the posterior distribution (Neal 2011). The algorithm uses Hamiltonian dynamics to enable distant proposals in the Metropolis algorithm, making it efficient in many scenarios. The HMC algorithm is applicable only to continuous parameters.
HMC translates the target density function to a potential energy function and adds an auxiliary momentum variable for each model parameter
. The resulting joint density has the form
where is the posterior of the parameters
(up to a normalizing constant). HMC draws from the joint space of
, discards
, and retains
as samples from
. The algorithm uses the idea of Hamiltonian dynamics in preserving the total energy of a physical system, in which
is part of the potential energy function and
is part of the kinetic energy (velocity). As the velocity changes, the potential energy changes accordingly, leading to the movements in the parameter space.
At each iteration, the HMC algorithm first generates the momentum variables , usually from standard normal distributions, that are independent of
. Then the algorithm follows with a Metropolis update that includes many steps along a trajectory while maintaining the total energy of the system. One of the most common approaches in moving along this trajectory is the leapfrog method, which involves L steps with a step size
,
where is the gradient of the log posterior with respect to
. After L steps, the proposed state
is accepted as the next state of the Markov chain with probability
.
Although HMC can lead to rapid convergence, it also heavily relies on two requirements: the gradient calculation of the logarithm of the posterior density and carefully selected tuning parameters, in step size and number of steps L. Step sizes that are too large or too small can lead to acceptance rates that are too low or too high, both of which affect the convergence of the Markov chain. A large L leads to a large trajectory length (
), which can move the parameters back to their original positions. A small L limits the movement of the chain.
An example of an adaptive HMC algorithm with automatic tuning of and L is the No-U-Turn Sampler (NUTS; Hoffman and Gelman 2014). The NUTS algorithm uses a doubling process to build a binary tree whose leaf nodes correspond to the states of the parameters and momentum variables. The initial tree has a single node with no heights (
). The doubling process expands the tree to either the left or right in a binary fashion, and in each direction, the algorithm takes
leapfrog steps of size
. Obviously, as the height of the tree (j) increases, the computational cost increases dramatically. The tree expands until one sampling trajectory makes a U-turn and starts to revisit parameter space that has been already explored. The NUTS algorithm tunes
so that the actual acceptance rate during the doubling process is close to a predetermined target acceptance probability
(usually set to 0.6 or higher). When the tuning stage ends, the NUTS algorithm proceeds to the main sampling stage and starts to draw posterior samples that have a fixed
value. Increasing the targeted acceptance probability
can often improve mixing, but it can also slow down the process significantly. For more information about the NUTS algorithm and its efficiency, see Hoffman and Gelman (2014).