The MI Procedure

Monotone Propensity Score Method

The propensity score method is another imputation method available for continuous variables when the data set has a monotone missing pattern.

A propensity score is generally defined as the conditional probability of assignment to a particular treatment given a vector of observed covariates (Rosenbaum and Rubin 1983). In the propensity score method, for a variable with missing values, a propensity score is generated for each observation to estimate the probability that the observation is missing. The observations are then grouped based on these propensity scores, and an approximate Bayesian bootstrap imputation (Rubin 1987, p. 124) is applied to each group (Lavori, Dawson, and Shera 1995).

The propensity score method uses the following steps to impute values for variable upper Y Subscript j with missing values:

  1. Creates an indicator variable upper R Subscript j with the value 0 for observations with missing upper Y Subscript j and 1 otherwise.

  2. Fits a logistic regression model

    normal l normal o normal g normal i normal t left-parenthesis p Subscript j Baseline right-parenthesis equals beta 0 plus beta 1 upper X 1 plus beta 2 upper X 2 plus ellipsis plus beta Subscript k Baseline upper X Subscript k

    where upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript k Baseline are covariates for upper Y Subscript j, p Subscript j Baseline equals normal upper P normal r left-parenthesis upper R Subscript j Baseline equals 0 vertical-bar upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript k Baseline right-parenthesis,   and   normal l normal o normal g normal i normal t left-parenthesis p right-parenthesis equals normal l normal o normal g left-parenthesis p slash left-parenthesis 1 minus p right-parenthesis right-parenthesis period

  3. Creates a propensity score for each observation to estimate the probability that it is missing.

  4. Divides the observations into a fixed number of groups (typically assumed to be five) based on these propensity scores.

  5. Applies an approximate Bayesian bootstrap imputation to each group. In group k, suppose that upper Y Subscript o b s denotes the n 1 observations with nonmissing upper Y Subscript j values and upper Y Subscript m i s denotes the n 0 observations with missing upper Y Subscript j. The approximate Bayesian bootstrap imputation first draws n 1 observations randomly with replacement from upper Y Subscript o b s to create a new data set upper Y Subscript o b s Superscript asterisk. This is a nonparametric analog of drawing parameters from the posterior predictive distribution of the parameters. The process then draws the n 0 values for upper Y Subscript m i s randomly with replacement from upper Y Subscript o b s Superscript asterisk.

Steps 1 through 5 are repeated sequentially for each variable with missing values.

The propensity score method was originally designed for a randomized experiment with repeated measures on the response variables. The goal was to impute the missing values on the response variables. The method uses only the covariate information that is associated with whether the imputed variable values are missing; it does not use correlations among variables. It is effective for inferences about the distributions of individual imputed variables, such as a univariate analysis, but it is not appropriate for analyses that involve relationship among variables, such as a regression analysis (Schafer 1999, p. 11). It can also produce badly biased estimates of regression coefficients when data on predictor variables are missing (Allison 2000).

Last updated: December 09, 2022