The MI Procedure

Monotone and FCS Predictive Mean Matching Methods

The predictive mean matching method is also an imputation method available for continuous variables. It is similar to the regression method except that for each missing value, it imputes a value randomly from a set of observed values whose predicted values are closest to the predicted value for the missing value from the simulated regression model (Heitjan and Little 1991; Schenker and Taylor 1996).

Following the description of the model in the section Monotone and FCS Regression Methods, the following steps are used to generate imputed values:

  1. New parameters bold-italic beta Subscript asterisk Baseline equals left-parenthesis beta Subscript asterisk 0 Baseline comma beta Subscript asterisk 1 Baseline comma ellipsis comma beta Subscript asterisk left-parenthesis k right-parenthesis Baseline right-parenthesis and sigma Subscript asterisk j Superscript 2 are drawn from the posterior predictive distribution of the parameters. That is, they are simulated from left-parenthesis ModifyingAbove beta With caret Subscript 0 Baseline comma ModifyingAbove beta With caret Subscript 1 Baseline comma ellipsis comma ModifyingAbove beta With caret Subscript k Baseline right-parenthesis, sigma Subscript j Superscript 2, and bold upper V Subscript j. The variance is drawn as

    sigma Subscript asterisk j Superscript 2 Baseline equals ModifyingAbove sigma With caret Subscript j Superscript 2 Baseline left-parenthesis n Subscript j Baseline minus k minus 1 right-parenthesis slash g

    where g is a chi Subscript n Sub Subscript j Subscript minus k minus 1 Superscript 2 random variate and n Subscript j is the number of nonmissing observations for upper Y Subscript j. The regression coefficients are drawn as

    bold-italic beta Subscript asterisk Baseline equals ModifyingAbove bold-italic beta With caret plus sigma Subscript asterisk j Baseline bold upper V prime Subscript h j Baseline bold upper Z

    where bold upper V Subscript h j is the upper triangular matrix in the Cholesky decomposition, bold upper V Subscript j Baseline equals bold upper V prime Subscript h j Baseline bold upper V Subscript h j, and bold upper Z is a vector of k plus 1 independent random normal variates.

  2. For each missing value, a predicted value

    y Subscript i asterisk Baseline equals beta Subscript asterisk 0 Baseline plus beta Subscript asterisk 1 Baseline x 1 plus beta Subscript asterisk 2 Baseline x 2 plus ellipsis plus beta Subscript asterisk left-parenthesis k right-parenthesis Baseline x Subscript k

    is computed with the covariate values x 1 comma x 2 comma ellipsis comma x Subscript k Baseline.

  3. A set of k 0 observations whose corresponding predicted values are closest to y Subscript i asterisk is generated. You can specify k 0 with the K= option.

  4. The missing value is then replaced by a value drawn randomly from these k 0 observed values.

The predictive mean matching method requires the number of closest observations to be specified. A smaller k 0 tends to increase the correlation among the multiple imputations for the missing observation and results in a higher variability of point estimators in repeated sampling. On the other hand, a larger k 0 tends to lessen the effect from the imputation model and results in biased estimators (Schenker and Taylor 1996, p. 430).

The predictive mean matching method ensures that imputed values are plausible; it might be more appropriate than the regression method if the normality assumption is violated (Horton and Lipsitz 2001, p. 246).

Last updated: December 09, 2022