The MI Procedure

MCMC Method for Arbitrary Missing Multivariate Normal Data

The Markov chain Monte Carlo (MCMC) method originated in physics as a tool for exploring equilibrium distributions of interacting molecules. In statistical applications, it is used to generate pseudorandom draws from multidimensional and otherwise intractable probability distributions via Markov chains. A Markov chain is a sequence of random variables in which the distribution of each element depends only on the value of the previous element.

In MCMC simulation, you constructs a Markov chain long enough for the distribution of the elements to stabilize to a stationary distribution, which is the distribution of interest. Repeatedly simulating steps of the chain simulates draws from the distribution of interest. See Schafer (1997) for a detailed discussion of this method.

In Bayesian inference, information about unknown parameters is expressed in the form of a posterior probability distribution. This posterior distribution is computed using Bayes’ theorem,

p left-parenthesis bold-italic theta vertical-bar y right-parenthesis equals StartFraction p left-parenthesis y vertical-bar bold-italic theta right-parenthesis p left-parenthesis bold-italic theta right-parenthesis Over integral p left-parenthesis y vertical-bar bold-italic theta right-parenthesis p left-parenthesis bold-italic theta right-parenthesis d bold-italic theta EndFraction

MCMC has been applied as a method for exploring posterior distributions in Bayesian inference. That is, through MCMC, you can simulate the entire joint posterior distribution of the unknown quantities and obtain simulation-based estimates of posterior parameters that are of interest.

In many incomplete-data problems, the observed-data posterior p left-parenthesis bold-italic theta vertical-bar upper Y Subscript o b s Baseline right-parenthesis is intractable and cannot easily be simulated. However, when upper Y Subscript o b s is augmented by an estimated or simulated value of the missing data upper Y Subscript m i s, the complete-data posterior p left-parenthesis bold-italic theta vertical-bar upper Y Subscript o b s Baseline comma upper Y Subscript m i s Baseline right-parenthesis is much easier to simulate. Assuming that the data are from a multivariate normal distribution, data augmentation can be applied to Bayesian inference with missing data by repeating the following steps:

1. The imputation I-step Given an estimated mean vector and covariance matrix, the I-step simulates the missing values for each observation independently. That is, if you denote the variables with missing values for observation i by upper Y Subscript i left-parenthesis m i s right-parenthesis and the variables with observed values by upper Y Subscript i left-parenthesis o b s right-parenthesis, then the I-step draws values for upper Y Subscript i left-parenthesis m i s right-parenthesis from a conditional distribution for upper Y Subscript i left-parenthesis m i s right-parenthesis given upper Y Subscript i left-parenthesis o b s right-parenthesis.

2. The posterior P-step Given a complete sample, the P-step simulates the posterior population mean vector and covariance matrix. These new estimates are then used in the next I-step. Without prior information about the parameters, a noninformative prior is used. You can also use other informative priors. For example, a prior information about the covariance matrix can help to stabilize the inference about the mean vector for a near singular covariance matrix.

That is, with a current parameter estimate bold-italic theta Superscript left-parenthesis t right-parenthesis at the tth iteration, the I-step draws upper Y Subscript m i s Superscript left-parenthesis t plus 1 right-parenthesis from p left-parenthesis upper Y Subscript m i s Baseline vertical-bar upper Y Subscript o b s Baseline comma bold-italic theta Superscript left-parenthesis t right-parenthesis Baseline right-parenthesis and the P-step draws bold-italic theta Superscript left-parenthesis t plus 1 right-parenthesis from p left-parenthesis bold-italic theta vertical-bar upper Y Subscript o b s Baseline comma upper Y Subscript m i s Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis. The two steps are iterated long enough for the results to reliably simulate an approximately independent draw of the missing values for a multiply imputed data set (Schafer 1997).

This creates a Markov chain left-parenthesis upper Y Subscript m i s Superscript left-parenthesis 1 right-parenthesis Baseline comma bold-italic theta Superscript left-parenthesis 1 right-parenthesis Baseline right-parenthesis , left-parenthesis upper Y Subscript m i s Superscript left-parenthesis 2 right-parenthesis Baseline comma bold-italic theta Superscript left-parenthesis 2 right-parenthesis Baseline right-parenthesis , …, which converges in distribution to p left-parenthesis upper Y Subscript m i s Baseline comma bold-italic theta vertical-bar upper Y Subscript o b s Baseline right-parenthesis. Assuming the iterates converge to a stationary distribution, the goal is to simulate an approximately independent draw of the missing values from this distribution.

To validate the imputation results, you should repeat the process with different random number generators and starting values based on different initial parameter estimates.

The next three sections provide details for the imputation step, Bayesian estimation of the mean vector and covariance matrix, and the posterior step.

Imputation Step

In each iteration, starting with a given mean vector bold-italic mu and covariance matrix bold upper Sigma, the imputation step draws values for the missing data from the conditional distribution upper Y Subscript m i s given upper Y Subscript o b s.

Suppose bold-italic mu equals left-bracket bold-italic mu prime 1 comma bold-italic mu prime 2 right-bracket prime is the partitioned mean vector of two sets of variables, upper Y Subscript o b s and upper Y Subscript m i s, where bold-italic mu 1 is the mean vector for variables upper Y Subscript o b s and bold-italic mu 2 is the mean vector for variables upper Y Subscript m i s.

Also suppose

StartLayout 1st Row 1st Column bold upper Sigma 2nd Column equals 3rd Column Start 2 By 2 Matrix 1st Row 1st Column bold upper Sigma 11 2nd Column bold upper Sigma 12 2nd Row 1st Column bold upper Sigma prime 12 2nd Column bold upper Sigma 22 EndMatrix EndLayout

is the partitioned covariance matrix for these variables, where bold upper Sigma 11 is the covariance matrix for variables upper Y Subscript o b s, bold upper Sigma 22 is the covariance matrix for variables upper Y Subscript m i s, and bold upper Sigma 12 is the covariance matrix between variables upper Y Subscript o b s and variables upper Y Subscript m i s.

By using the sweep operator (Goodnight 1979) on the pivots of the bold upper Sigma 11 submatrix, the matrix becomes

StartLayout 1st Row  Start 2 By 2 Matrix 1st Row 1st Column bold upper Sigma 11 Superscript negative 1 2nd Column bold upper Sigma 11 Superscript negative 1 Baseline bold upper Sigma 12 2nd Row 1st Column minus bold upper Sigma prime 12 bold upper Sigma 11 Superscript negative 1 2nd Column bold upper Sigma 22.1 EndMatrix EndLayout

where bold upper Sigma 22.1 equals bold upper Sigma 22 minus bold upper Sigma prime 12 bold upper Sigma 11 Superscript negative 1 Baseline bold upper Sigma 12 can be used to compute the conditional covariance matrix of bold upper Y Subscript m i s after controlling for bold upper Y Subscript o b s.

For an observation with the preceding missing pattern, the conditional distribution of upper Y Subscript m i s given upper Y Subscript o b s Baseline equals bold y 1 is a multivariate normal distribution with the mean vector

bold-italic mu 2.1 equals bold-italic mu 2 plus bold upper Sigma prime 12 bold upper Sigma 11 Superscript negative 1 Baseline left-parenthesis bold y 1 minus bold-italic mu 1 right-parenthesis

and the conditional covariance matrix

bold upper Sigma 22.1 equals bold upper Sigma 22 minus bold upper Sigma prime 12 bold upper Sigma 11 Superscript negative 1 Baseline bold upper Sigma 12

Bayesian Estimation of the Mean Vector and Covariance Matrix

Suppose that bold upper Y equals left-parenthesis bold y prime 1 comma bold y prime 2 comma ellipsis comma bold y prime Subscript n right-parenthesis prime is an left-parenthesis n times p right-parenthesis matrix made up of n left-parenthesis p times 1 right-parenthesis independent vectors bold y Subscript i, each of which has a multivariate normal distribution with mean zero and covariance matrix bold upper Lamda. Then the SSCP matrix

bold upper A equals bold upper Y prime bold upper Y equals sigma-summation Underscript i Endscripts bold y Subscript i Baseline bold y prime Subscript i

has a Wishart distribution upper W left-parenthesis n comma bold upper Lamda right-parenthesis.

When each observation bold y Subscript i is distributed with a multivariate normal distribution with an unknown mean bold-italic mu, then the CSSCP matrix

bold upper A equals sigma-summation Underscript i Endscripts left-parenthesis bold y Subscript i Baseline minus bold y overbar right-parenthesis left-parenthesis bold y Subscript i Baseline minus bold y overbar right-parenthesis prime

has a Wishart distribution upper W left-parenthesis n minus 1 comma bold upper Lamda right-parenthesis.

If bold upper A has a Wishart distribution upper W left-parenthesis n comma bold upper Lamda right-parenthesis, then bold upper B equals bold upper A Superscript negative 1 has an inverted Wishart distribution upper W Superscript negative 1 Baseline left-parenthesis n comma bold upper Psi right-parenthesis, where n is the degrees of freedom and bold upper Psi equals bold upper Lamda Superscript negative 1 is the precision matrix (Anderson 1984).

Note that, instead of using the parameter bold upper Psi equals bold upper Lamda Superscript negative 1 for the inverted Wishart distribution, Schafer (1997) uses the parameter bold upper Lamda.

Suppose that each observation in the data matrix bold upper Y has a multivariate normal distribution with mean bold-italic mu and covariance matrix bold upper Sigma. Then with a prior inverted Wishart distribution for bold upper Sigma and a prior normal distribution for bold-italic mu

StartLayout 1st Row 1st Column bold upper Sigma tilde 2nd Column Blank 3rd Column upper W Superscript negative 1 Baseline left-parenthesis m comma bold upper Psi right-parenthesis 2nd Row 1st Column bold-italic mu vertical-bar bold upper Sigma tilde 2nd Column Blank 3rd Column upper N left-parenthesis bold-italic mu 0 comma StartFraction 1 Over tau EndFraction bold upper Sigma right-parenthesis EndLayout

where tau greater-than 0 is a fixed number.

The posterior distribution (Anderson 1984, p. 270; Schafer 1997, p. 152) is

StartLayout 1st Row 1st Column bold upper Sigma vertical-bar bold upper Y tilde 2nd Column Blank 3rd Column upper W Superscript negative 1 Baseline left-parenthesis n plus m comma left-parenthesis n minus 1 right-parenthesis bold upper S plus bold upper Psi plus StartFraction n tau Over n plus tau EndFraction left-parenthesis bold y overbar minus bold-italic mu 0 right-parenthesis left-parenthesis bold y overbar minus bold-italic mu 0 right-parenthesis Superscript prime Baseline right-parenthesis 2nd Row 1st Column bold-italic mu vertical-bar left-parenthesis bold upper Sigma comma bold upper Y right-parenthesis tilde 2nd Column Blank 3rd Column upper N left-parenthesis StartFraction 1 Over n plus tau EndFraction left-parenthesis n bold y overbar plus tau bold-italic mu 0 right-parenthesis comma StartFraction 1 Over n plus tau EndFraction bold upper Sigma right-parenthesis EndLayout

where left-parenthesis n minus 1 right-parenthesis bold upper S is the CSSCP matrix.

Posterior Step

In each iteration, the posterior step simulates the posterior population mean vector bold-italic mu and covariance matrix bold upper Sigma from prior information for bold-italic mu and bold upper Sigma, and the complete sample estimates.

You can specify the prior parameter information by using one of the following methods:

  • PRIOR=JEFFREYS, which uses a noninformative prior

  • PRIOR=INPUT=, which provides a prior information for bold upper Sigma in the data set. Optionally, it also provides a prior information for bold-italic mu in the data set.

  • PRIOR=RIDGE=, which uses a ridge prior

The next four subsections provide details of the posterior step for different prior distributions.

1. A Noninformative Prior

Without prior information about the mean and covariance estimates, you can use a noninformative prior by specifying the PRIOR=JEFFREYS option. The posterior distributions (Schafer 1997, p. 154) are

StartLayout 1st Row 1st Column bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline vertical-bar bold upper Y tilde 2nd Column Blank 3rd Column upper W Superscript negative 1 Baseline left-parenthesis n minus 1 comma left-parenthesis n minus 1 right-parenthesis bold upper S right-parenthesis 2nd Row 1st Column bold-italic mu Superscript left-parenthesis t plus 1 right-parenthesis Baseline vertical-bar left-parenthesis bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline comma bold upper Y right-parenthesis tilde 2nd Column Blank 3rd Column upper N left-parenthesis bold y overbar comma StartFraction 1 Over n EndFraction bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis EndLayout
2. An Informative Prior for bold-italic mu and bold upper Sigma

When prior information is available for the parameters bold-italic mu and bold upper Sigma, you can provide it with a SAS data set that you specify with the PRIOR=INPUT= option:

StartLayout 1st Row 1st Column bold upper Sigma tilde 2nd Column Blank 3rd Column upper W Superscript negative 1 Baseline left-parenthesis d Superscript asterisk Baseline comma d Superscript asterisk Baseline bold upper S Superscript asterisk Baseline right-parenthesis 2nd Row 1st Column bold-italic mu vertical-bar bold upper Sigma tilde 2nd Column Blank 3rd Column upper N left-parenthesis bold-italic mu 0 comma StartFraction 1 Over n 0 EndFraction bold upper Sigma right-parenthesis EndLayout

To obtain the prior distribution for bold upper Sigma, PROC MI reads the matrix bold upper S Superscript asterisk from observations in the data set with _TYPE_=‘COV’, and it reads n Superscript asterisk Baseline equals d Superscript asterisk Baseline plus 1 from observations with _TYPE_=‘N’.

To obtain the prior distribution for bold-italic mu, PROC MI reads the mean vector bold-italic mu 0 from observations with _TYPE_=‘MEAN’, and it reads n 0 from observations with _TYPE_=‘N_MEAN’. When there are no observations with _TYPE_=‘N_MEAN’, PROC MI reads n 0 from observations with _TYPE_=‘N’.

The resulting posterior distribution, as described in the section Bayesian Estimation of the Mean Vector and Covariance Matrix, is given by

StartLayout 1st Row 1st Column bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline vertical-bar bold upper Y tilde 2nd Column Blank 3rd Column upper W Superscript negative 1 Baseline left-parenthesis n plus d Superscript asterisk Baseline comma left-parenthesis n minus 1 right-parenthesis bold upper S plus d Superscript asterisk Baseline bold upper S Superscript asterisk Baseline plus bold upper S Subscript m Baseline right-parenthesis 2nd Row 1st Column bold-italic mu Superscript left-parenthesis t plus 1 right-parenthesis Baseline vertical-bar left-parenthesis bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline comma bold upper Y right-parenthesis tilde 2nd Column Blank 3rd Column upper N left-parenthesis StartFraction 1 Over n plus n 0 EndFraction left-parenthesis n bold y overbar plus n 0 bold-italic mu 0 right-parenthesis comma StartFraction 1 Over n plus n 0 EndFraction bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis EndLayout

where

bold upper S Subscript m Baseline equals StartFraction n n 0 Over n plus n 0 EndFraction left-parenthesis bold y overbar minus bold-italic mu 0 right-parenthesis left-parenthesis bold y overbar minus bold-italic mu 0 right-parenthesis prime
3. An Informative Prior for bold upper Sigma

When the sample covariance matrix bold upper S is singular or near singular, prior information about bold upper Sigma can also be used without prior information about bold-italic mu to stabilize the inference about bold-italic mu. You can provide it with a SAS data set that you specify with the PRIOR=INPUT= option.

To obtain the prior distribution for bold upper Sigma, PROC MI reads the matrix bold upper S Superscript asterisk from observations in the data set with _TYPE_=‘COV’, and it reads n Superscript asterisk from observations with _TYPE_=‘N’.

The resulting posterior distribution for left-parenthesis bold-italic mu comma bold upper Sigma right-parenthesis (Schafer 1997, p. 156) is

StartLayout 1st Row 1st Column bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline vertical-bar bold upper Y tilde 2nd Column Blank 3rd Column upper W Superscript negative 1 Baseline left-parenthesis n plus d Superscript asterisk Baseline comma left-parenthesis n minus 1 right-parenthesis bold upper S plus d Superscript asterisk Baseline bold upper S Superscript asterisk Baseline right-parenthesis 2nd Row 1st Column bold-italic mu Superscript left-parenthesis t plus 1 right-parenthesis Baseline vertical-bar left-parenthesis bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline comma bold upper Y right-parenthesis tilde 2nd Column Blank 3rd Column upper N left-parenthesis bold y overbar comma StartFraction 1 Over n EndFraction bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis EndLayout

Note that if the PRIOR=INPUT= data set also contains observations with _TYPE_=‘MEAN’, then a complete informative prior for both bold-italic mu and bold upper Sigma will be used.

4. A Ridge Prior

A special case of the preceding adjustment is a ridge prior with bold upper S Superscript asterisk = Diag left-parenthesis bold upper S right-parenthesis (Schafer 1997, p. 156). That is, bold upper S Superscript asterisk is a diagonal matrix with diagonal elements equal to the corresponding elements in bold upper S.

You can request a ridge prior by using the PRIOR=RIDGE= option. You can explicitly specify the number d Superscript asterisk Baseline greater-than-or-equal-to 1 in the PRIOR=RIDGE=d Superscript asterisk option. Or you can implicitly specify the number by specifying the proportion p in the PRIOR=RIDGE=p option to request d Superscript asterisk Baseline equals left-parenthesis n minus 1 right-parenthesis p.

The posterior is then given by

StartLayout 1st Row 1st Column bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline vertical-bar bold upper Y tilde 2nd Column Blank 3rd Column upper W Superscript negative 1 Baseline left-parenthesis n plus d Superscript asterisk Baseline comma left-parenthesis n minus 1 right-parenthesis bold upper S plus d Superscript asterisk Baseline Diag left-parenthesis bold upper S right-parenthesis right-parenthesis 2nd Row 1st Column bold-italic mu Superscript left-parenthesis t plus 1 right-parenthesis Baseline vertical-bar left-parenthesis bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline comma bold upper Y right-parenthesis tilde 2nd Column Blank 3rd Column upper N left-parenthesis bold y overbar comma StartFraction 1 Over n EndFraction bold upper Sigma Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis EndLayout
Last updated: December 09, 2022