The MIANALYZE Procedure

Combining Inferences from Imputed Data Sets

With m imputations, m different sets of the point and variance estimates for a parameter Q can be computed. Suppose that ModifyingAbove upper Q Subscript i Baseline With caret and ModifyingAbove upper W Subscript i Baseline With caret are the point and variance estimates, respectively, from the ith imputed data set, i = 1, 2, …, m. Then the combined point estimate for Q from multiple imputation is the average of the m complete-data estimates:

upper Q overbar equals StartFraction 1 Over m EndFraction sigma-summation Underscript i equals 1 Overscript m Endscripts ModifyingAbove upper Q Subscript i Baseline With caret

Suppose that upper W overbar is the within-imputation variance, which is the average of the m complete-data estimates:

upper W overbar equals StartFraction 1 Over m EndFraction sigma-summation Underscript i equals 1 Overscript m Endscripts ModifyingAbove upper W Subscript i Baseline With caret

And suppose that B is the between-imputation variance:

upper B equals StartFraction 1 Over m minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript m Endscripts left-parenthesis ModifyingAbove upper Q Subscript i Baseline With caret minus upper Q overbar right-parenthesis squared

Then the variance estimate associated with upper Q overbar is the total variance (Rubin 1987)

upper T equals upper W overbar plus left-parenthesis 1 plus StartFraction 1 Over m EndFraction right-parenthesis upper B

The statistic left-parenthesis upper Q minus upper Q overbar right-parenthesis upper T Superscript minus left-parenthesis 1 slash 2 right-parenthesis is approximately distributed as t with v Subscript m degrees of freedom (Rubin 1987), where

v Subscript m Baseline equals left-parenthesis m minus 1 right-parenthesis left-bracket 1 plus StartFraction upper W overbar Over left-parenthesis 1 plus m Superscript negative 1 Baseline right-parenthesis upper B EndFraction right-bracket squared

The degrees of freedom v Subscript m depend on m and the ratio

r equals StartFraction left-parenthesis 1 plus m Superscript negative 1 Baseline right-parenthesis upper B Over upper W overbar EndFraction

The ratio r is called the relative increase in variance due to nonresponse (Rubin 1987). When there is no missing information about Q, the values of r and B are both zero. With a large value of m or a small value of r, the degrees of freedom v Subscript m will be large and the distribution of left-parenthesis upper Q minus upper Q overbar right-parenthesis upper T Superscript minus left-parenthesis 1 slash 2 right-parenthesis will be approximately normal.

Another useful statistic is the fraction of missing information about Q:

ModifyingAbove lamda With caret equals StartFraction r plus 2 slash left-parenthesis v Subscript m Baseline plus 3 right-parenthesis Over r plus 1 EndFraction

Both statistics r and lamda are helpful diagnostics for assessing how the missing data contribute to the uncertainty about Q.

When the complete-data degrees of freedom v 0 are small, and there is only a modest proportion of missing data, the computed degrees of freedom, v Subscript m, can be much larger than v 0, which is inappropriate. For example, with m = 5 and r = 10%, the computed degrees of freedom v Subscript m Baseline equals 484, which is inappropriate for data sets with complete-data degrees of freedom less than 484.

Barnard and Rubin (1999) recommend the use of adjusted degrees of freedom

v Subscript m Superscript asterisk Baseline equals left-bracket StartFraction 1 Over v Subscript m Baseline EndFraction plus StartFraction 1 Over ModifyingAbove v With caret Subscript o b s Baseline EndFraction right-bracket Superscript negative 1

where   ModifyingAbove v With caret Subscript o b s Baseline equals left-parenthesis 1 minus gamma right-parenthesis v 0 left-parenthesis v 0 plus 1 right-parenthesis slash left-parenthesis v 0 plus 3 right-parenthesis   and   gamma equals left-parenthesis 1 plus m Superscript negative 1 Baseline right-parenthesis upper B slash upper T.

If you specify the complete-data degrees of freedom v 0 with the EDF= option, the MIANALYZE procedure uses the adjusted degrees of freedom, v Subscript m Superscript asterisk, for inference. Otherwise, the degrees of freedom v Subscript m are used.

Last updated: December 09, 2022