The MIANALYZE Procedure

Multivariate Inferences

Multivariate inference based on Wald tests can be done with m imputed data sets. The approach is a generalization of the approach taken in the univariate case (Rubin 1987, p. 137; Schafer 1997, p. 113). Suppose that ModifyingAbove bold upper Q Subscript i Baseline With caret and ModifyingAbove bold upper W Subscript i Baseline With caret are the point and covariance matrix estimates for a p-dimensional parameter bold upper Q (such as a multivariate mean) from the i normal t normal h imputed data set, i = 1, 2, …, m. Then the combined point estimate for bold upper Q from the multiple imputation is the average of the m complete-data estimates:

bold upper Q overbar equals StartFraction 1 Over m EndFraction sigma-summation Underscript i equals 1 Overscript m Endscripts ModifyingAbove bold upper Q Subscript i Baseline With caret

Suppose that bold upper W overbar is the within-imputation covariance matrix, which is the average of the m complete-data estimates:

bold upper W overbar equals StartFraction 1 Over m EndFraction sigma-summation Underscript i equals 1 Overscript m Endscripts ModifyingAbove bold upper W Subscript i Baseline With caret

And suppose that bold upper B is the between-imputation covariance matrix:

bold upper B equals StartFraction 1 Over m minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript m Endscripts left-parenthesis ModifyingAbove bold upper Q Subscript i Baseline With caret minus bold upper Q overbar right-parenthesis left-parenthesis ModifyingAbove bold upper Q Subscript i Baseline With caret minus bold upper Q overbar right-parenthesis prime

Then the covariance matrix associated with bold upper Q overbar is the total covariance matrix

bold upper T 0 equals bold upper W overbar plus left-parenthesis 1 plus StartFraction 1 Over m EndFraction right-parenthesis bold upper B

The natural multivariate extension of the t statistic used in the univariate case is the F statistic

upper F 0 equals left-parenthesis bold upper Q minus bold upper Q overbar right-parenthesis prime bold upper T 0 Superscript negative 1 Baseline left-parenthesis bold upper Q minus bold upper Q overbar right-parenthesis slash p

with degrees of freedom p and

v equals left-parenthesis m minus 1 right-parenthesis left-parenthesis 1 plus 1 slash r right-parenthesis squared

where

r equals left-parenthesis 1 plus StartFraction 1 Over m EndFraction right-parenthesis normal t normal r normal a normal c normal e left-parenthesis bold upper B bold upper W overbar Superscript negative 1 Baseline right-parenthesis slash p

is an average relative increase in variance due to nonresponse (Rubin 1987, p. 137; Schafer 1997, p. 114).

However, the reference distribution of the statistic upper F 0 is not easily derived. Especially for small m, the between-imputation covariance matrix bold upper B is unstable and does not have full rank for m less-than-or-equal-to p (Schafer 1997, p. 113).

One solution is to make an additional assumption that the population between-imputation and within-imputation covariance matrices are proportional to each other (Schafer 1997, p. 113). This assumption implies that the fractions of missing information for all components of bold upper Q are equal. Under this assumption, a more stable estimate of the total covariance matrix is

bold upper T equals left-parenthesis 1 plus r right-parenthesis bold upper W overbar

With the total covariance matrix bold upper T, the F statistic (Rubin 1987, p. 137)

upper F equals left-parenthesis bold upper Q minus bold upper Q overbar right-parenthesis prime bold upper T Superscript negative 1 Baseline left-parenthesis bold upper Q minus bold upper Q overbar right-parenthesis slash p

has an F distribution with degrees of freedom p and v 1, where

v 1 equals one-half left-parenthesis p plus 1 right-parenthesis left-parenthesis m minus 1 right-parenthesis left-parenthesis 1 plus StartFraction 1 Over r EndFraction right-parenthesis squared

For t equals p left-parenthesis m minus 1 right-parenthesis less-than-or-equal-to 4, PROC MIANALYZE uses the degrees of freedom v 1 in the analysis. For t equals p left-parenthesis m minus 1 right-parenthesis greater-than 4, PROC MIANALYZE uses v 2, a better approximation of the degrees of freedom given by Li, Raghunathan, and Rubin (1991):

v 2 equals 4 plus left-parenthesis t minus 4 right-parenthesis left-bracket 1 plus StartFraction 1 Over r EndFraction left-parenthesis 1 minus StartFraction 2 Over t EndFraction right-parenthesis right-bracket squared
Last updated: December 09, 2022