The MI Procedure

FCS Methods for Data Sets with Arbitrary Missing Patterns

For a data set with an arbitrary missing data pattern, you can use FCS methods to impute missing values for all variables, assuming the existence of a joint distribution for these variables (Brand 1999; Van Buuren 2007). FCS method involves two phases in each imputation: the preliminary filled-in phase followed by the imputation phase.

At the filled-in phase, the missing values for all variables are filled in sequentially over the variables taken one at a time. The missing values for each variable are filled in using the specified method, or the default method for the variable if a method is not specified, with preceding variables serving as the covariates. These filled-in values provide starting values for these missing values at the imputation phase.

At the imputation phase, the missing values for each variable are imputed using the specified method and covariates at each iteration. The default method for the variable is used if a method is not specified, and the remaining variables are used as covariates if the set of covariates is not specified. After a number of iterations, as specified with the NBITER= option, the imputed values in each variable are used for the imputation. At each iteration, the missing values are imputed sequentially over the variables taken one at a time.

The MI procedure orders the variables as they are ordered in the VAR statement. For example, if the order of the p variables in the VAR statement is upper Y 1, upper Y 2, …, upper Y Subscript p, then upper Y 1, upper Y 2, …, upper Y Subscript p (in that order) are used in the filled-in and imputation phases.

The filled-in phase replaces missing values with filled-in values for each variable. That is, with p variables upper Y 1, upper Y 2, …, upper Y Subscript p (in that order), the missing values are filled in by using the sequence,

StartLayout 1st Row 1st Column bold-italic theta 1 Superscript left-parenthesis 0 right-parenthesis 2nd Column tilde 3rd Column upper P left-parenthesis bold-italic theta 1 vertical-bar upper Y Subscript 1 left-parenthesis o b s right-parenthesis Baseline right-parenthesis 2nd Row 1st Column upper Y Subscript 1 left-parenthesis asterisk right-parenthesis Superscript left-parenthesis 0 right-parenthesis 2nd Column tilde 3rd Column upper P left-parenthesis upper Y 1 vertical-bar bold-italic theta 1 Superscript left-parenthesis 0 right-parenthesis Baseline right-parenthesis 3rd Row 1st Column upper Y 1 Superscript left-parenthesis 0 right-parenthesis 2nd Column equals 3rd Column left-parenthesis upper Y Subscript 1 left-parenthesis o b s right-parenthesis Baseline comma upper Y Subscript 1 left-parenthesis asterisk right-parenthesis Superscript left-parenthesis 0 right-parenthesis Baseline right-parenthesis 4th Row 1st Column Blank 2nd Column ellipsis 3rd Column Blank 5th Row 1st Column Blank 2nd Column ellipsis 3rd Column Blank 6th Row 1st Column bold-italic theta Subscript p Superscript left-parenthesis 0 right-parenthesis 2nd Column tilde 3rd Column upper P left-parenthesis bold-italic theta Subscript p Baseline vertical-bar upper Y 1 Superscript left-parenthesis 0 right-parenthesis Baseline comma ellipsis comma upper Y Subscript p minus 1 Superscript left-parenthesis 0 right-parenthesis Baseline comma upper Y Subscript p left-parenthesis o b s right-parenthesis Baseline right-parenthesis 7th Row 1st Column upper Y Subscript p left-parenthesis asterisk right-parenthesis Superscript left-parenthesis 0 right-parenthesis 2nd Column tilde 3rd Column upper P left-parenthesis upper Y Subscript p Baseline vertical-bar bold-italic theta Subscript p Superscript left-parenthesis 0 right-parenthesis Baseline right-parenthesis 8th Row 1st Column upper Y Subscript p Superscript left-parenthesis 0 right-parenthesis 2nd Column equals 3rd Column left-parenthesis upper Y Subscript p left-parenthesis o b s right-parenthesis Baseline comma upper Y Subscript p left-parenthesis asterisk right-parenthesis Superscript left-parenthesis 0 right-parenthesis Baseline right-parenthesis EndLayout

where upper Y Subscript j left-parenthesis o b s right-parenthesis is the set of observed upper Y Subscript j values, upper Y Subscript j left-parenthesis asterisk right-parenthesis Superscript left-parenthesis 0 right-parenthesis is the set of filled-in upper Y Subscript j values, upper Y Subscript j Superscript left-parenthesis 0 right-parenthesis is the set of both observed and filled-in upper Y Subscript j values, and bold-italic theta Subscript j Superscript left-parenthesis 0 right-parenthesis is the set of simulated parameters for the conditional distribution of upper Y Subscript j given variables upper Y 1, upper Y 2, …, upper Y Subscript j minus 1.

For each variable upper Y Subscript j with missing values, the corresponding imputation method is used to fit the model with covariates upper Y 1 comma upper Y 2 comma ellipsis comma upper Y Subscript j minus 1 Baseline. The observed observations for upper Y Subscript j, which might include observations with filled-in values for upper Y 1 comma upper Y 2 comma ellipsis comma upper Y Subscript j minus 1 Baseline, are used in the model fitting. With this resulting model, a new model is drawn and then used to impute missing values for upper Y Subscript j.

The imputation phase replaces these filled-in values upper Y Subscript j left-parenthesis asterisk right-parenthesis Superscript left-parenthesis 0 right-parenthesis with imputed values for each variable sequentially at each iteration. That is, with p variables upper Y 1, upper Y 2, …, upper Y Subscript p (in that order), the missing values are imputed with the sequence at iteration t + 1,

StartLayout 1st Row 1st Column bold-italic theta 1 Superscript left-parenthesis t plus 1 right-parenthesis 2nd Column tilde 3rd Column upper P left-parenthesis bold-italic theta 1 vertical-bar upper Y Subscript 1 left-parenthesis o b s right-parenthesis Baseline comma upper Y 2 Superscript left-parenthesis t right-parenthesis Baseline comma ellipsis comma upper Y Subscript p Superscript left-parenthesis t right-parenthesis Baseline right-parenthesis 2nd Row 1st Column upper Y Subscript 1 left-parenthesis asterisk right-parenthesis Superscript left-parenthesis t plus 1 right-parenthesis 2nd Column tilde 3rd Column upper P left-parenthesis upper Y 1 vertical-bar bold-italic theta 1 Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis 3rd Row 1st Column upper Y 1 Superscript left-parenthesis t plus 1 right-parenthesis 2nd Column equals 3rd Column left-parenthesis upper Y Subscript 1 left-parenthesis o b s right-parenthesis Baseline comma upper Y Subscript 1 left-parenthesis asterisk right-parenthesis Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis 4th Row 1st Column Blank 2nd Column ellipsis 3rd Column Blank 5th Row 1st Column Blank 2nd Column ellipsis 3rd Column Blank 6th Row 1st Column bold-italic theta Subscript p Superscript left-parenthesis t plus 1 right-parenthesis 2nd Column tilde 3rd Column upper P left-parenthesis bold-italic theta Subscript p Baseline vertical-bar upper Y 1 Superscript left-parenthesis t plus 1 right-parenthesis Baseline comma ellipsis comma upper Y Subscript p minus 1 Superscript left-parenthesis t plus 1 right-parenthesis Baseline comma upper Y Subscript p left-parenthesis o b s right-parenthesis Baseline right-parenthesis 7th Row 1st Column upper Y Subscript p left-parenthesis asterisk right-parenthesis Superscript left-parenthesis t plus 1 right-parenthesis 2nd Column tilde 3rd Column upper P left-parenthesis upper Y Subscript p Baseline vertical-bar bold-italic theta Subscript p Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis 8th Row 1st Column upper Y Subscript p Superscript left-parenthesis t plus 1 right-parenthesis 2nd Column equals 3rd Column left-parenthesis upper Y Subscript p left-parenthesis o b s right-parenthesis Baseline comma upper Y Subscript p left-parenthesis asterisk right-parenthesis Superscript left-parenthesis t plus 1 right-parenthesis Baseline right-parenthesis EndLayout

where upper Y Subscript j left-parenthesis o b s right-parenthesis is the set of observed upper Y Subscript j values, upper Y Subscript j left-parenthesis asterisk right-parenthesis Superscript left-parenthesis t plus 1 right-parenthesis is the set of imputed upper Y Subscript j values at iteration t + 1, upper Y Subscript j left-parenthesis asterisk right-parenthesis Superscript left-parenthesis t right-parenthesis is the set of filled-in upper Y Subscript j values (t = 0) or the set of imputed upper Y Subscript j values at iteration t (t > 0), upper Y Subscript j Superscript left-parenthesis t plus 1 right-parenthesis is the set of both observed and imputed upper Y Subscript j values at iteration t + 1, and bold-italic theta Subscript j Superscript left-parenthesis t plus 1 right-parenthesis is the set of simulated parameters for the conditional distribution of upper Y Subscript j given covariates constructed from upper Y 1, …, upper Y Subscript j minus 1, upper Y Subscript j plus 1, …, upper Y Subscript p.

At each iteration, a specified model is fitted for each variable with missing values by using observed observations for that variable, which might include observations with imputed values for other variables. With this resulting model, a new model is drawn and then used to impute missing values for the imputed variable.

The steps are iterated long enough for the results to reliably simulate an approximately independent draw of the missing values for an imputed data set.

The imputation methods used in the filled-in and imputation phases are similar to the corresponding monotone methods for monotone missing data. You can use a regression method or a predictive mean matching method to impute missing values for a continuous variable, a logistic regression method for a classification variable with a binary or ordinal response, and a discriminant function method for a classification variable with a binary or nominal response. See the sections Monotone and FCS Regression Methods, Monotone and FCS Predictive Mean Matching Methods, Monotone and FCS Discriminant Function Methods, and Monotone and FCS Logistic Regression Methods for these methods.

The FCS method requires fewer iterations than the MCMC method (Van Buuren 2007). Often, as few as five or 10 iterations are enough to produce satisfactory results (Van Buuren 2007; Brand 1999).

Last updated: December 09, 2022