The MI Procedure

Adjusting Imputed Values in Pattern-Mixture Models

It is straightforward to specify pattern-mixture models under the MNAR assumption. When you impute continuous variables by using the regression and predictive mean matching methods, you can adjust the imputed values directly (Carpenter and Kenward 2013, pp. 237–239; Van Buuren 2012, pp. 88–89). When you impute classification variables by using the logistic regression method, you can adjust the imputed classification levels by modifying the log odds ratios for the classification levels (Carpenter and Kenward 2013, pp. 240–241; Van Buuren 2012, pp. 88–89). By modifying the log odds ratios, you modify the predicted probabilities for the classification levels.

For each imputed variable, you can use the ADJUST option to do the following:

  • specify a subset of observations for which imputed values are adjusted. Otherwise, all imputed values are adjusted.

  • adjust imputed continuous variable values by using the SHIFT=, SCALE=, and SIGMA= options. These options add a constant, multiply by a constant factor, and add a simulated value to the imputed values, respectively.

  • adjust imputed classification variable levels by adjusting predicted probabilities for the classification levels by using the SHIFT= and SIGMA= options. These options add a constant and add a simulated constant value, respectively, to the log odds ratios for the classification levels.

In addition, you can provide the shift and scale parameters for each imputation by using a PARMS= data set.

When you use the MNAR statement together with a MONOTONE statement, the variables are imputed sequentially. For each imputed variable, the values can be adjusted using the ADJUST option, and these adjusted values are used to impute values for subsequent variables.

When you use the MNAR statement together with an FCS statement, there are two phases in each imputation: the preliminary filled-in phase, followed by the imputation phase. For each imputed variable, the values can be adjusted using the ADJUST option in the imputation phase in each of the imputations. These adjusted values are used to impute values for other variables in the imputation phase.

For illustrations of adjusting imputed continuous values, adjusting log odds ratio for imputed classification levels, and adjusting imputed continuous values by using parameters that are stored in an input data set, see Example 82.16, Example 82.17, and Example 82.18, respectively.

Specifying the Imputed Values to Be Adjusted

By default, all available imputed values are adjusted. You can specify a subset of imputed values to be adjusted by using the ADJUSTOBS= suboption in the ADJUST option.

You can specify a classification variable to identify the subset of imputed values to be adjusted by using the ADJUSTOBS= (obs-variable= level1’ <’level2’ …>) option. This subset consists of the imputed values in the set of observations for which obs-variable equals one of the specified levels.

Adjusting Imputed Continuous Variables

For an imputed continuous variable, the SCALE=c option specifies the scale parameter, c > 0, for imputed values; the SHIFT=delta option specifies the shift parameter, delta, for imputed values; and the SIGMA=sigma option specifies the sigma parameter, sigma > 0, for imputed values.

When the sigma parameter is not specified, the adjusted value for each imputed value y is given by

y Superscript asterisk Baseline equals c y plus delta

where c is the scale parameter and delta is the shift parameter.

When you specify a sigma parameter sigma, a simulated shift parameter is generated from the normal distribution that has mean delta and standard deviation sigma in each imputation

delta Superscript asterisk Baseline tilde upper N left-parenthesis delta comma sigma squared right-parenthesis

The adjusted value is then given by

y Superscript asterisk Baseline equals c y plus delta Superscript asterisk

Adjusting Imputed Classification Variables

For an imputed classification variable, you can specify adjustment parameters for the response level. The SHIFT=delta option specifies the shift parameter delta, the SIGMA=sigma option specifies the sigma parameter sigma > 0, and the EVENT=’level’ option identifies the response level.

When the sigma parameter is not specified, the shift parameter delta is used in all imputations. When you specify a sigma parameter sigma, a simulated shift parameter is generated from the normal distribution that has mean delta and standard deviation sigma for each imputation

delta Superscript asterisk Baseline tilde upper N left-parenthesis delta comma sigma squared right-parenthesis

The next three sections provide details for adjusting imputed binary, ordinal, and nominal response variables.

Adjusting Imputed Binary Response Variables

For an imputed binary classification variable Y, the shift parameter delta is applied to the logit function values for the corresponding response level.

For instance, if Y has binary responses 1 and 2, a simulated logit model

logit left-parenthesis normal p normal r left-parenthesis upper Y equals 1 vertical-bar bold x right-parenthesis right-parenthesis equals alpha plus bold x prime bold-italic beta

is used to impute the missing response values. For a detailed description of this simulated logit model, see the section Binary Response Logistic Regression.

For an observation that has missing Y and covariates bold x bold 0, the predicted probabilities that Y=1 and Y=2 are then given by

normal p normal r left-parenthesis upper Y equals 1 right-parenthesis equals StartFraction e Superscript alpha plus bold x prime bold 0 bold-italic beta Baseline Over e Superscript alpha plus bold x prime bold 0 bold-italic beta Baseline plus 1 EndFraction equals StartFraction e Superscript d 1 Baseline Over e Superscript d 1 Baseline plus e Superscript d 2 Baseline EndFraction
normal p normal r left-parenthesis upper Y equals 2 right-parenthesis equals StartFraction 1 Over e Superscript alpha plus bold x prime bold 0 bold-italic beta Baseline plus 1 EndFraction equals StartFraction e Superscript d 2 Baseline Over e Superscript d 1 Baseline plus e Superscript d 2 Baseline EndFraction

where d 1 equals alpha plus bold x prime bold 0 bold-italic beta and d 2 equals 0.

When you provide the shift parameters delta 1 for the response Y=1 and delta 2 for the response Y=2, the predicted probabilities are

normal p normal r left-parenthesis upper Y equals 1 right-parenthesis equals StartFraction e Superscript d 1 Super Superscript asterisk Superscript Baseline Over e Superscript d 1 Super Superscript asterisk Superscript Baseline plus e Superscript d 2 Super Superscript asterisk Superscript Baseline EndFraction
normal p normal r left-parenthesis upper Y equals 2 right-parenthesis equals StartFraction e Superscript d 2 Super Superscript asterisk Superscript Baseline Over e Superscript d 1 Super Superscript asterisk Superscript Baseline plus e Superscript d 2 Super Superscript asterisk Superscript Baseline EndFraction

where d 1 Superscript asterisk Baseline equals d 1 plus delta 1 and d 2 Superscript asterisk Baseline equals d 2 plus delta 2 equals delta 2.

For example, the following statement specifies the shift parameters delta 1 equals 0.8 and delta 2 equals 1.6:

mnar adjust( y(event='1') / shift=0.8)
     adjust( y(event='2') / shift=1.6);

The statement

mnar adjust( y(event='1') / shift=0.8 sigma=0.2);

simulates a shift parameter delta 1 from

delta tilde upper N left-parenthesis 0.8 comma 0.2 squared right-parenthesis

in each imputation. Because an adjustment is not specified for Y=2, the corresponding shift parameter is delta 2 equals 0.

Adjusting Imputed Ordinal Response Variables

For an imputed ordinal classification variable Y, the shift parameter delta is applied to the cumulative logit function values for the corresponding response level.

For instance, if Y has ordinal responses 1, 2, …, K, a simulated cumulative logit model that has covariates bold x,

logit left-parenthesis normal p normal r left-parenthesis upper Y less-than-or-equal-to k vertical-bar bold x right-parenthesis right-parenthesis equals alpha Subscript k Baseline plus bold x prime bold-italic beta

is used to impute the missing response values, where k = 1, 2, …, K–1. For a detailed description of this model, see the section Ordinal Response Logistic Regression.

For an observation that has missing Y and covariates bold x bold 0, the predicted cumulative probability for upper Y less-than-or-equal-to j, j = 1, 2, …, K–1, is then given by

normal p normal r left-parenthesis upper Y less-than-or-equal-to j right-parenthesis equals StartFraction e Superscript alpha Super Subscript j Superscript plus bold x prime bold 0 bold-italic beta Baseline Over e Superscript alpha Super Subscript j Superscript plus bold x prime bold 0 bold-italic beta Baseline plus 1 EndFraction equals StartFraction e Superscript d Super Subscript j Superscript Baseline Over e Superscript d Super Subscript j Superscript Baseline plus e Superscript d Super Subscript upper K Superscript Baseline EndFraction

where d Subscript j Baseline equals alpha Subscript j Baseline plus bold x prime bold 0 bold-italic beta and d Subscript upper K Baseline equals 0.

The predicted probabilities for upper Y equals k are

normal p normal r left-parenthesis upper Y equals k right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column StartFraction e Superscript d 1 Baseline Over e Superscript d 1 Baseline plus e Superscript d Super Subscript upper K Superscript Baseline EndFraction 2nd Column normal i normal f k equals 1 2nd Row 1st Column StartFraction e Superscript d Super Subscript k Superscript Baseline Over e Superscript d Super Subscript k Superscript Baseline plus e Superscript d Super Subscript upper K Superscript Baseline EndFraction minus StartFraction e Superscript d Super Subscript left-parenthesis k minus 1 right-parenthesis Superscript Baseline Over e Superscript d Super Subscript left-parenthesis k minus 1 right-parenthesis Superscript Baseline plus e Superscript d Super Subscript upper K Superscript Baseline EndFraction 2nd Column normal i normal f 1 less-than k less-than upper K 3rd Row 1st Column StartFraction e Superscript d Super Subscript upper K Superscript Baseline Over e Superscript d Super Subscript left-parenthesis upper K minus 1 right-parenthesis Superscript Baseline plus e Superscript d Super Subscript upper K Superscript Baseline EndFraction 2nd Column normal i normal f k equals upper K EndLayout

For an ordinal logistic regression method that has two response levels, the section Adjusting Imputed Binary Response Variables explains how the predicted probabilities are adjusted using shift parameters.

For an ordinal logistic regression method that has more than two response levels, only one classification level can be adjusted. When you provide the shift parameter delta for the response level upper Y equals k, the predicted probability for upper Y equals k is then given by

normal p normal r left-parenthesis upper Y equals k right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column StartFraction e Superscript d 1 Super Superscript asterisk Superscript Baseline Over e Superscript d 1 Super Superscript asterisk Superscript Baseline plus e Superscript d Super Subscript upper K Superscript Baseline EndFraction 2nd Column normal i normal f k equals 1 2nd Row 1st Column StartFraction e Superscript d Super Subscript k Super Superscript asterisk Superscript Baseline Over e Superscript d Super Subscript k Super Superscript asterisk Superscript Baseline plus e Superscript d Super Subscript upper K Superscript Baseline EndFraction minus StartFraction e Superscript d Super Subscript left-parenthesis k minus 1 right-parenthesis Superscript Baseline Over e Superscript d Super Subscript left-parenthesis k minus 1 right-parenthesis Superscript Baseline plus e Superscript d Super Subscript upper K Superscript Baseline EndFraction 2nd Column normal i normal f 1 less-than k less-than upper K 3rd Row 1st Column StartFraction e Superscript d Super Subscript upper K Super Superscript asterisk Superscript Baseline Over e Superscript d Super Subscript left-parenthesis upper K minus 1 right-parenthesis Superscript Baseline plus e Superscript d Super Subscript upper K Super Superscript asterisk Superscript Baseline EndFraction 2nd Column normal i normal f k equals upper K EndLayout

where d Subscript k Superscript asterisk Baseline equals d Subscript k Baseline plus delta.

The predicted probabilities for the remaining upper Y not-equals k are then adjusted proportionally. When the shift parameter delta is less than 0, the value d Subscript k Superscript asterisk can be less than d Subscript k minus 1 for 1 less-than k less-than upper K. In this case, normal p normal r left-parenthesis upper Y equals k right-parenthesis is set to 0.

Adjusting Imputed Nominal Response Variables

For an imputed nominal classification variable Y, the shift parameter delta is applied to the generalized logit model function values for the corresponding response level.

For instance, if VariableY has nominal responses 1, 2, …, K, a simulated generalized logit model

log left-parenthesis StartFraction normal p normal r left-parenthesis upper Y equals k vertical-bar bold x right-parenthesis Over normal p normal r left-parenthesis upper Y equals upper K vertical-bar bold x right-parenthesis EndFraction right-parenthesis equals alpha Subscript k Baseline plus bold x prime bold-italic beta Subscript k Baseline

is used to impute the missing response values, where k=1, 2, …, K–1. For a detailed description of this model, see the section Nominal Response Logistic Regression.

For an observation with missing Y and covariates bold x bold 0, the predicted probability for Y = j, j < K, is then given by

normal p normal r left-parenthesis upper Y equals j right-parenthesis equals StartFraction e Superscript alpha Super Subscript j Superscript plus bold x prime bold 0 bold-italic beta Super Subscript j Superscript Baseline Over sigma-summation Underscript k equals 1 Overscript upper K minus 1 Endscripts e Superscript alpha Super Subscript k Superscript plus bold x prime bold 0 bold-italic beta Super Subscript k Superscript Baseline plus 1 EndFraction equals StartFraction e Superscript d Super Subscript j Superscript Baseline Over sigma-summation Underscript k equals 1 Overscript upper K Endscripts e Superscript d Super Subscript k Superscript Baseline EndFraction

and

normal p normal r left-parenthesis upper Y equals upper K right-parenthesis equals StartFraction 1 Over sigma-summation Underscript k equals 1 Overscript upper K minus 1 Endscripts e Superscript alpha Super Subscript k Superscript plus bold x prime bold 0 bold-italic beta Super Subscript k Superscript Baseline plus 1 EndFraction equals StartFraction e Superscript d Super Subscript upper K Superscript Baseline Over sigma-summation Underscript k equals 1 Overscript upper K Endscripts e Superscript d Super Subscript k Superscript Baseline EndFraction

where d Subscript k Baseline equals alpha Subscript k Baseline plus bold x prime bold-italic beta Subscript k for k less-than upper K and d Subscript upper K Baseline equals 0.

When you use the shift parameters delta Subscript k for upper Y equals k comma k equals 1 comma 2 comma ellipsis comma upper K, the predicted probabilities are

normal p normal r left-parenthesis upper Y equals j right-parenthesis equals StartFraction e Superscript d Super Subscript j Super Superscript asterisk Superscript Baseline Over sigma-summation Underscript k equals 1 Overscript upper K Endscripts e Superscript d Super Subscript k Super Superscript asterisk Superscript Baseline EndFraction

where d Subscript k Superscript asterisk Baseline equals d Subscript k Baseline plus delta Subscript k.

Last updated: December 09, 2022