The MI Procedure

Monotone and FCS Regression Methods

The regression method is the default imputation method in the MONOTONE and FCS statements for continuous variables.

In the regression method, a regression model is fitted for a continuous variable with the covariates constructed from a set of effects. Based on the fitted regression model, a new regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable (Rubin 1987, pp. 166–167). That is, for a continuous variable upper Y Subscript j with missing values, a model

upper Y Subscript j Baseline equals beta 0 plus beta 1 upper X 1 plus beta 2 upper X 2 plus ellipsis plus beta Subscript k Baseline upper X Subscript k

is fitted using observations with observed values for the variable upper Y Subscript j and its covariates upper X 1, upper X 2, …, upper X Subscript k.

The fitted model includes the regression parameter estimates ModifyingAbove bold-italic beta With caret equals left-parenthesis ModifyingAbove beta With caret Subscript 0 Baseline comma ModifyingAbove beta With caret Subscript 1 Baseline comma ellipsis comma ModifyingAbove beta With caret Subscript k Baseline right-parenthesis and the associated covariance matrix ModifyingAbove sigma With caret Subscript j Superscript 2 Baseline bold upper V Subscript j, where bold upper V Subscript j is the usual bold upper X prime bold upper X inverse matrix derived from the intercept and covariates upper X 1, upper X 2, …, upper X Subscript k.

The following steps are used to generate imputed values for each imputation:

  1. New parameters bold-italic beta Subscript asterisk Baseline equals left-parenthesis beta Subscript asterisk 0 Baseline comma beta Subscript asterisk 1 Baseline comma ellipsis comma beta Subscript asterisk left-parenthesis k right-parenthesis Baseline right-parenthesis and sigma Subscript asterisk j Superscript 2 are drawn from the posterior predictive distribution of the parameters. That is, they are simulated from left-parenthesis ModifyingAbove beta With caret Subscript 0 Baseline comma ModifyingAbove beta With caret Subscript 1 Baseline comma ellipsis comma ModifyingAbove beta With caret Subscript k Baseline right-parenthesis, sigma Subscript j Superscript 2, and bold upper V Subscript j. The variance is drawn as

    sigma Subscript asterisk j Superscript 2 Baseline equals ModifyingAbove sigma With caret Subscript j Superscript 2 Baseline left-parenthesis n Subscript j Baseline minus k minus 1 right-parenthesis slash g

    where g is a chi Subscript n Sub Subscript j Subscript minus k minus 1 Superscript 2 random variate and n Subscript j is the number of nonmissing observations for upper Y Subscript j. The regression coefficients are drawn as

    bold-italic beta Subscript asterisk Baseline equals ModifyingAbove bold-italic beta With caret plus sigma Subscript asterisk j Baseline bold upper V prime Subscript h j Baseline bold upper Z

    where bold upper V Subscript h j is the upper triangular matrix in the Cholesky decomposition, bold upper V Subscript j Baseline equals bold upper V prime Subscript h j Baseline bold upper V Subscript h j, and bold upper Z is a vector of k plus 1 independent random normal variates.

  2. The missing values are then replaced by

    beta Subscript asterisk 0 Baseline plus beta Subscript asterisk 1 Baseline x 1 plus beta Subscript asterisk 2 Baseline x 2 plus ellipsis plus beta Subscript asterisk left-parenthesis k right-parenthesis Baseline x Subscript k plus z Subscript i Baseline sigma Subscript asterisk j

    where x 1 comma x 2 comma ellipsis comma x Subscript k Baseline are the values of the covariates and z Subscript i is a simulated normal deviate.

Last updated: December 09, 2022