Introduction to Bayesian Analysis Procedures

Introduction

The most frequently used statistical methods are known as frequentist (or classical) methods. These methods assume that unknown parameters are fixed constants, and they define probability by using limiting relative frequencies. It follows from these assumptions that probabilities are objective and that you cannot make probabilistic statements about parameters because they are fixed. Bayesian methods offer an alternative approach; they treat parameters as random variables and define probability as "degrees of belief" (that is, the probability of an event is the degree to which you believe the event is true). It follows from these postulates that probabilities are subjective and that you can make probability statements about parameters. The term "Bayesian" comes from the prevalent usage of Bayes’ theorem, which was named after the Reverend Thomas Bayes, an eighteenth century Presbyterian minister. Bayes was interested in solving the question of inverse probability: after observing a collection of events, what is the probability of one event?

Suppose you are interested in estimating theta from data bold y equals StartSet y 1 comma ellipsis comma y Subscript n Baseline EndSet by using a statistical model described by a density p left-parenthesis bold y vertical-bar theta right-parenthesis. Bayesian philosophy states that theta cannot be determined exactly, and uncertainty about the parameter is expressed through probability statements and distributions. You can say that theta follows a normal distribution with mean 0 and variance 1, if it is believed that this distribution best describes the uncertainty associated with the parameter. The following steps describe the essential elements of Bayesian inference:

  1. A probability distribution for theta is formulated as pi left-parenthesis theta right-parenthesis, which is known as the prior distribution, or just the prior. The prior distribution expresses your beliefs (for example, on the mean, the spread, the skewness, and so forth) about the parameter before you examine the data.

  2. Given the observed data bold y, you choose a statistical model p left-parenthesis bold y vertical-bar theta right-parenthesis to describe the distribution of bold y given theta.

  3. You update your beliefs about theta by combining information from the prior distribution and the data through the calculation of the posterior distribution, p left-parenthesis theta vertical-bar bold y right-parenthesis.

The third step is carried out by using Bayes’ theorem, which enables you to combine the prior distribution and the model in the following way:

p left-parenthesis theta vertical-bar bold y right-parenthesis equals StartFraction p left-parenthesis theta comma bold y right-parenthesis Over p left-parenthesis bold y right-parenthesis EndFraction equals StartFraction p left-parenthesis bold y vertical-bar theta right-parenthesis pi left-parenthesis theta right-parenthesis Over p left-parenthesis bold y right-parenthesis EndFraction equals StartFraction p left-parenthesis bold y vertical-bar theta right-parenthesis pi left-parenthesis theta right-parenthesis Over integral p left-parenthesis bold y vertical-bar theta right-parenthesis pi left-parenthesis theta right-parenthesis d theta EndFraction

The quantity

p left-parenthesis bold y right-parenthesis equals integral p left-parenthesis bold y vertical-bar theta right-parenthesis pi left-parenthesis theta right-parenthesis d theta

is the normalizing constant of the posterior distribution. This quantity p left-parenthesis bold y right-parenthesis is also the marginal distribution of bold y, and it is sometimes called the marginal distribution of the data. The likelihood function of theta is any function proportional to p left-parenthesis bold y vertical-bar theta right-parenthesis; that is, upper L left-parenthesis theta right-parenthesis proportional-to p left-parenthesis bold y vertical-bar theta right-parenthesis. Another way of writing Bayes’ theorem is as follows:

p left-parenthesis theta vertical-bar bold y right-parenthesis equals StartFraction upper L left-parenthesis theta right-parenthesis pi left-parenthesis theta right-parenthesis Over integral upper L left-parenthesis theta right-parenthesis pi left-parenthesis theta right-parenthesis d theta EndFraction

The marginal distribution p left-parenthesis bold y right-parenthesis is an integral. As long as the integral is finite, the particular value of the integral does not provide any additional information about the posterior distribution. Hence, p left-parenthesis theta vertical-bar bold y right-parenthesis can be written up to an arbitrary constant, presented here in proportional form as:

p left-parenthesis theta vertical-bar bold y right-parenthesis proportional-to upper L left-parenthesis theta right-parenthesis pi left-parenthesis theta right-parenthesis

Simply put, Bayes’ theorem tells you how to update existing knowledge with new information. You begin with a prior belief pi left-parenthesis theta right-parenthesis, and after learning information from data bold y, you change or update your belief about theta and obtain p left-parenthesis theta vertical-bar bold y right-parenthesis. These are the essential elements of the Bayesian approach to data analysis.

In theory, Bayesian methods offer simple alternatives to statistical inference—all inferences follow from the posterior distribution p left-parenthesis theta vertical-bar bold y right-parenthesis. In practice, however, you can obtain the posterior distribution with straightforward analytical solutions only in the most rudimentary problems. Most Bayesian analyses require sophisticated computations, including the use of simulation methods, such as the Markov chain Monte Carlo (MCMC) methods. You generate samples from the posterior distribution and use these samples to estimate the quantities of interest. PROC MCMC uses a self-tuning Metropolis algorithm (see the section Metropolis and Metropolis-Hastings Algorithms). The GENMOD, LIFEREG, and PHREG procedures use the Gibbs sampler (see the section Gibbs Sampler). The BCHOICE and FMM procedure use a combination of the Gibbs sampler and the latent variable sampler. The BGLIMM procedure uses the Gibbs sampler, the Hamiltonian Monte Carlo sampler, and the Gamerman algorithm for generalized linear mixed-effects models. An important aspect of any analysis is assessing the convergence of the Markov chains. Inferences that are based on nonconverged Markov chains can be both inaccurate and misleading.

Both Bayesian and classical methods have their advantages and disadvantages. From a practical point of view, your choice of method depends on what you want to accomplish with your data analysis. If you have prior information (either expert opinion or historical knowledge) that you want to incorporate into the analysis, then you should consider Bayesian methods. In addition, if you want to communicate your findings in terms of probability notions that can be more easily understood by nonstatisticians, Bayesian methods might be appropriate. The Bayesian paradigm can often provide a framework for answering specific scientific questions that a single point estimate cannot sufficiently address. Alternatively, if you are interested only in estimating parameters based on the likelihood, then numerical optimization methods, such as the Newton-Raphson method, can give you very precise estimates and there is no need to use a Bayesian analysis. For further discussions of the relative advantages and disadvantages of Bayesian analysis, see the section Bayesian Analysis: Advantages and Disadvantages.

Last updated: December 09, 2022