The MI Procedure

Monotone and FCS Logistic Regression Methods

The logistic regression method is another imputation method available for classification variables. In the logistic regression method, a logistic regression model is fitted for a classification variable with a set of covariates constructed from the effects, where the classification variable is an ordinal response or a nominal response variable.

In the MI procedure, ordered values are assigned to response levels in ascending sorted order. If the response variable Y takes values in StartSet 1 comma ellipsis comma upper K EndSet, then for ordinal response models, the cumulative model has the form

logit left-parenthesis probability left-parenthesis upper Y less-than-or-equal-to j vertical-bar bold x right-parenthesis right-parenthesis equals log left-parenthesis StartFraction probability left-parenthesis upper Y less-than-or-equal-to j vertical-bar bold x right-parenthesis Over 1 minus probability left-parenthesis upper Y less-than-or-equal-to j vertical-bar bold x right-parenthesis EndFraction right-parenthesis equals alpha Subscript j Baseline plus bold-italic beta prime bold x comma j equals 1 comma ellipsis comma upper K minus 1

where alpha 1 comma ellipsis comma alpha Subscript upper K minus 1 Baseline are K-1 intercept parameters, and bold-italic beta is the vector of slope parameters.

For nominal response logistic models, where the K possible responses have no natural ordering, the generalized logit model has the form

log left-parenthesis StartFraction probability left-parenthesis upper Y equals j vertical-bar bold x right-parenthesis Over probability left-parenthesis upper Y equals upper K vertical-bar bold x right-parenthesis EndFraction right-parenthesis equals alpha Subscript j Baseline plus bold-italic beta prime Subscript j Baseline bold x comma j equals 1 comma ellipsis comma upper K minus 1

where the alpha 1 comma ellipsis comma alpha Subscript upper K minus 1 Baseline are K-1 intercept parameters, and the bold-italic beta 1 comma ellipsis comma bold-italic beta Subscript upper K minus 1 Baseline are K-1 vectors of slope parameters.

Binary Response Logistic Regression

For a binary classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable (Rubin 1987, pp. 167–170).

For a binary variable Y with responses 1 and 2, a logistic regression model is fitted using observations with observed values for the imputed variable Y:

normal l normal o normal g normal i normal t left-parenthesis p 1 right-parenthesis equals beta 0 plus beta 1 upper X 1 plus beta 2 upper X 2 plus ellipsis plus beta Subscript p Baseline upper X Subscript p

where upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript p Baseline are covariates for Y,   p 1 equals normal upper P normal r left-parenthesis upper Y equals 1 vertical-bar upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript p Baseline right-parenthesis,   and   normal l normal o normal g normal i normal t left-parenthesis p 1 right-parenthesis equals normal l normal o normal g left-parenthesis p 1 slash left-parenthesis 1 minus p 1 right-parenthesis right-parenthesis

The fitted model includes the regression parameter estimates ModifyingAbove bold-italic beta With caret equals left-parenthesis ModifyingAbove beta With caret Subscript 0 Baseline comma ModifyingAbove beta With caret Subscript 1 Baseline comma ellipsis comma ModifyingAbove beta With caret Subscript p Baseline right-parenthesis and the associated covariance matrix bold upper V.

The following steps are used to generate imputed values for a binary variable Y with responses 1 and 2:

  1. New parameters bold-italic beta Subscript asterisk Baseline equals left-parenthesis beta Subscript asterisk 0 Baseline comma beta Subscript asterisk 1 Baseline comma ellipsis comma beta Subscript asterisk left-parenthesis p right-parenthesis Baseline right-parenthesis are drawn from the posterior predictive distribution of the parameters.

    bold-italic beta Subscript asterisk Baseline equals ModifyingAbove bold-italic beta With caret plus bold upper V prime Subscript h Baseline bold upper Z

    where bold upper V Subscript h is the upper triangular matrix in the Cholesky decomposition, bold upper V equals bold upper V prime Subscript h Baseline bold upper V Subscript h, and bold upper Z is a vector of p plus 1 independent random normal variates.

  2. For an observation with missing upper Y Subscript j and covariates x 1 comma x 2 comma ellipsis comma x Subscript p Baseline, compute the predicted probability that Y= 1:

    p 1 equals StartFraction normal e normal x normal p left-parenthesis mu 1 right-parenthesis Over 1 plus normal e normal x normal p left-parenthesis mu 1 right-parenthesis EndFraction

    where mu 1 equals beta Subscript asterisk 0 Baseline plus beta Subscript asterisk 1 Baseline x 1 plus beta Subscript asterisk 2 Baseline x 2 plus ellipsis plus beta Subscript asterisk left-parenthesis p right-parenthesis Baseline x Subscript p.

  3. Draw a random uniform variate, u, between 0 and 1. If the value of u is less than p 1, impute Y= 1; otherwise impute Y= 2.

The binary logistic regression imputation method can be extended to include the ordinal classification variables with more than two levels of responses, and the nominal classification variables. The LINK=LOGIT and LINK=GLOGIT options can be used to specify the cumulative logit model and the generalized logit model, respectively. The options ORDER= and DESCENDING can be used to specify the sort order for the levels of the imputed variables.

Ordinal Response Logistic Regression

For an ordinal classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable.

For a variable Y with ordinal responses 1, 2, …, K, a logistic regression model is fitted using observations with observed values for the imputed variable Y:

normal l normal o normal g normal i normal t left-parenthesis p Subscript j Baseline right-parenthesis equals alpha Subscript j Baseline plus beta 1 upper X 1 plus beta 2 upper X 2 plus ellipsis plus beta Subscript p Baseline upper X Subscript p

where upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript p Baseline are covariates for Y and   p Subscript j Baseline equals normal upper P normal r left-parenthesis upper Y less-than-or-equal-to j vertical-bar upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript k Baseline right-parenthesis.

The fitted model includes the regression parameter estimates ModifyingAbove alpha With caret equals left-parenthesis ModifyingAbove alpha With caret Subscript 0 Baseline comma ellipsis comma ModifyingAbove alpha With caret Subscript upper K minus 1 Baseline right-parenthesis and ModifyingAbove bold-italic beta With caret equals left-parenthesis ModifyingAbove beta With caret Subscript 0 Baseline comma ModifyingAbove beta With caret Subscript 1 Baseline comma ellipsis comma ModifyingAbove beta With caret Subscript k Baseline right-parenthesis, and their associated covariance matrix bold upper V.

The following steps are used to generate imputed values for an ordinal classification variable Y with responses 1, 2, …, K:

  1. New parameters gamma Subscript asterisk are drawn from the posterior predictive distribution of the parameters.

    gamma Subscript asterisk Baseline equals ModifyingAbove gamma With caret plus bold upper V prime Subscript h Baseline bold upper Z

    where ModifyingAbove gamma With caret equals left-parenthesis ModifyingAbove alpha With caret comma ModifyingAbove bold-italic beta With caret right-parenthesis, bold upper V Subscript h is the upper triangular matrix in the Cholesky decomposition, bold upper V equals bold upper V prime Subscript h Baseline bold upper V Subscript h, and bold upper Z is a vector of p plus upper K minus 1 independent random normal variates.

  2. For an observation with missing Y and covariates x 1 comma x 2 comma ellipsis comma x Subscript k Baseline, compute the predicted cumulative probability for sans-serif upper Y less-than-or-equal-to j:

    p Subscript j Baseline equals normal p normal r left-parenthesis sans-serif upper Y less-than-or-equal-to j right-parenthesis equals StartFraction e Superscript alpha Super Subscript j Superscript plus bold x prime bold-italic beta Baseline Over e Superscript alpha Super Subscript j Superscript plus bold x prime bold-italic beta Baseline plus 1 EndFraction
  3. Draw a random uniform variate, u, between 0 and 1, then impute

    upper Y equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column normal i normal f normal u less-than p 1 2nd Row 1st Column k 2nd Column normal i normal f p Subscript k minus 1 Baseline less-than-or-equal-to normal u less-than p Subscript k Baseline 3rd Row 1st Column upper K 2nd Column normal i normal f p Subscript upper K minus 1 Baseline less-than-or-equal-to normal u EndLayout

Nominal Response Logistic Regression

For a nominal classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable.

For a variable Y with nominal responses 1, 2, …, K, a logistic regression model is fitted using observations with observed values for the imputed variable Y:

normal l normal o normal g left-parenthesis StartFraction p Subscript j Baseline Over p Subscript upper K Baseline EndFraction right-parenthesis equals alpha Subscript j Baseline plus beta Subscript j Baseline 1 Baseline upper X 1 plus beta Subscript j Baseline 2 Baseline upper X 2 plus ellipsis plus beta Subscript j p Baseline upper X Subscript p

where upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript p Baseline are covariates for Y and   p Subscript j Baseline equals normal upper P normal r left-parenthesis upper Y equals j vertical-bar upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript p Baseline right-parenthesis.

The fitted model includes the regression parameter estimates ModifyingAbove alpha With caret equals left-parenthesis ModifyingAbove alpha With caret Subscript 0 Baseline comma ellipsis comma ModifyingAbove alpha With caret Subscript upper K minus 1 Baseline right-parenthesis and ModifyingAbove bold-italic beta With caret equals left-parenthesis ModifyingAbove bold-italic beta With caret Subscript 0 Baseline comma ellipsis comma ModifyingAbove beta With caret Subscript upper K minus 1 Baseline right-parenthesis, and their associated covariance matrix bold upper V, where ModifyingAbove bold-italic beta With caret Subscript j Baseline equals left-parenthesis ModifyingAbove beta With caret Subscript j Baseline 0 Baseline comma ModifyingAbove beta With caret Subscript j Baseline 1 Baseline comma ellipsis comma ModifyingAbove beta With caret Subscript j p Baseline right-parenthesis,

The following steps are used to generate imputed values for a nominal classification variable Y with responses 1, 2, …, K:

  1. New parameters gamma Subscript asterisk are drawn from the posterior predictive distribution of the parameters.

    gamma Subscript asterisk Baseline equals ModifyingAbove gamma With caret plus bold upper V prime Subscript h Baseline bold upper Z

    where ModifyingAbove gamma With caret equals left-parenthesis ModifyingAbove alpha With caret comma ModifyingAbove bold-italic beta With caret right-parenthesis, bold upper V Subscript h is the upper triangular matrix in the Cholesky decomposition, bold upper V equals bold upper V prime Subscript h Baseline bold upper V Subscript h, and bold upper Z is a vector of p plus upper K minus 1 independent random normal variates.

  2. For an observation with missing Y and covariates x 1 comma x 2 comma ellipsis comma x Subscript k Baseline, compute the predicted probability for Y= j, j=1, 2, …, K-1:

    normal p normal r left-parenthesis sans-serif upper Y equals j right-parenthesis equals StartFraction e Superscript alpha Super Subscript j Superscript plus bold x prime bold-italic beta Super Subscript j Superscript Baseline Over sigma-summation Underscript k equals 1 Overscript upper K minus 1 Endscripts e Superscript alpha Super Subscript k Superscript plus bold x prime bold-italic beta Super Subscript k Superscript Baseline plus 1 EndFraction

    and

    normal p normal r left-parenthesis sans-serif upper Y equals upper K right-parenthesis equals StartFraction 1 Over sigma-summation Underscript k equals 1 Overscript upper K minus 1 Endscripts e Superscript alpha Super Subscript k Superscript plus bold x prime bold-italic beta Super Subscript k Superscript Baseline plus 1 EndFraction
  3. Compute the cumulative probability for sans-serif upper Y less-than-or-equal-to j:

    upper P Subscript j Baseline equals sigma-summation Underscript k equals 1 Overscript j Endscripts normal p normal r left-parenthesis sans-serif upper Y equals k right-parenthesis
  4. Draw a random uniform variate, u, between 0 and 1, then impute

    upper Y equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column normal i normal f normal u less-than p 1 2nd Row 1st Column k 2nd Column normal i normal f p Subscript k minus 1 Baseline less-than-or-equal-to normal u less-than p Subscript k Baseline 3rd Row 1st Column upper K 2nd Column normal i normal f p Subscript upper K minus 1 Baseline less-than-or-equal-to normal u EndLayout

Logistic Regression with Augmented Data

In a logistic regression model, you might not be able to find the maximum likelihood estimates of the parameters if there is no overlap of the sample points from response groups—that is, if the data points have either a complete separation pattern or a quasi-complete separation pattern.

Complete separation of data points occurs when a linear combination of predictors correctly allocates all observations to their response groups. Quasi-complete separation occurs when a linear combination of predictors correctly allocates all observations to their response groups except for a subset of observations where the values of linear combinations of predictors are identical. For more information about complete separation patterns and quasi-complete separation patterns, see the section Existence of Maximum Likelihood Estimates in Chapter 79, The LOGISTIC Procedure.

To address the separation issue in multiple imputation, White, Daniel, and Royston (2010) add observations to each response group and then use the augmented data to fit a weighted logistic regression. In each response group, 2p observations are added, where p is the number of predictors. More specifically, corresponding to each predictor, two observations are added: the first with the predictor mean minus the predictor standard deviation, and the second with the predictor mean plus the predictor standard deviation. In both observations, the values of other predictors are fixed at their corresponding means. Each additional observation contributes the same weight, and the total added weight is p+1. Each available observation in the data set (before augmentation) has a weight of 1. With this approach, there is an overlap of sample points, and maximum likelihood estimates can be obtained.

In the MONOTONE and FCS statements, the LIKELIHOOD=AUGMENT suboption in the LOGISTIC option requests maximum likelihood estimates based on augmented data. When LIKELIHOOD=AUGMENT, you can use the WEIGHT=w option to specify the total added weight w explicitly, or you can use the WEIGHT=NPARM option to specify the number of parameters as the total added weight. More specifically, for logistic regression models that consist only of p continuous effects, the added weight is p+1 for a simple binary logistic model, p+k–1 for an ordinal response model, and (p+1) (k–1) for a nominal response model, where k is the number of response levels.

If the ratio between the number of parameters and the number of available observations (before augmentation) is large, the effect from the added observations in the computation of maximum likelihood estimates can be significant. You can use the MULT=m suboption in the WEIGHT=NPARM option to reduce the total added weight, where the multiplier 0 < m less-than-or-equal-to 1. The resulting total added weight is then m times the number of parameters. Alternatively, you can use the WEIGHT=w option to specify a smaller total added weight w explicitly.

Last updated: December 09, 2022