(View the complete code for this example.)
This example uses the EM algorithm to compute the maximum likelihood estimates for parameters of multivariate normally distributed data with missing values. The following statements invoke the MI procedure and request the EM algorithm to compute the MLE for of a multivariate normal distribution from the input data set
Fitness1:
proc mi data=Fitness1 seed=1518971 simple nimpute=0;
em itprint outem=outem;
var Oxygen RunTime RunPulse;
run;
Note that when you specify the NIMPUTE=0 option, the missing values are not imputed.
The "Model Information" table in Output 82.1.1 describes the method and options used in the procedure if a positive number is specified in the NIMPUTE= option.
Output 82.1.1: Model Information
| Model Information | |
|---|---|
| Data Set | WORK.FITNESS1 |
| Method | MCMC |
| Multiple Imputation Chain | Single Chain |
| Initial Estimates for MCMC | EM Posterior Mode |
| Start | Starting Value |
| Prior | Jeffreys |
| Number of Imputations | 0 |
| Number of Burn-in Iterations | 200 |
| Number of Iterations | 100 |
| Seed for random number generator | 1518971 |
The "Missing Data Patterns" table in Output 82.1.2 lists distinct missing data patterns with corresponding frequencies and percentages. Here, a value of "X" means that the variable is observed in the corresponding group and a value of "." means that the variable is missing. The table also displays group-specific variable means.
Output 82.1.2: Missing Data Patterns
| Missing Data Patterns | ||||||||
|---|---|---|---|---|---|---|---|---|
| Group | Oxygen | RunTime | RunPulse | Freq | Percent | Group Means | ||
| Oxygen | RunTime | RunPulse | ||||||
| 1 | X | X | X | 21 | 67.74 | 46.353810 | 10.809524 | 171.666667 |
| 2 | X | X | . | 4 | 12.90 | 47.109500 | 10.137500 | . |
| 3 | X | . | . | 3 | 9.68 | 52.461667 | . | . |
| 4 | . | X | X | 1 | 3.23 | . | 11.950000 | 176.000000 |
| 5 | . | X | . | 2 | 6.45 | . | 9.885000 | . |
With the SIMPLE option, the procedure displays simple descriptive univariate statistics for available cases in the "Univariate Statistics" table in Output 82.1.3 and correlations from pairwise available cases in the "Pairwise Correlations" table in Output 82.1.4.
Output 82.1.3: Univariate Statistics
| Univariate Statistics | |||||||
|---|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Minimum | Maximum | Missing Values | |
| Count | Percent | ||||||
| Oxygen | 28 | 47.11618 | 5.41305 | 37.38800 | 60.05500 | 3 | 9.68 |
| RunTime | 28 | 10.68821 | 1.37988 | 8.63000 | 14.03000 | 3 | 9.68 |
| RunPulse | 22 | 171.86364 | 10.14324 | 148.00000 | 186.00000 | 9 | 29.03 |
Output 82.1.4: Pairwise Correlations
| Pairwise Correlations | |||
|---|---|---|---|
| Oxygen | RunTime | RunPulse | |
| Oxygen | 1.000000000 | -0.849118562 | -0.343961742 |
| RunTime | -0.849118562 | 1.000000000 | 0.247258191 |
| RunPulse | -0.343961742 | 0.247258191 | 1.000000000 |
When you use the EM statement, the MI procedure displays the initial parameter estimates for the EM algorithm in the "Initial Parameter Estimates for EM" table in Output 82.1.5.
Output 82.1.5: Initial Parameter Estimates for EM
| Initial Parameter Estimates for EM | ||||
|---|---|---|---|---|
| _TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
| MEAN | 47.116179 | 10.688214 | 171.863636 | |
| COV | Oxygen | 29.301078 | 0 | 0 |
| COV | RunTime | 0 | 1.904067 | 0 |
| COV | RunPulse | 0 | 0 | 102.885281 |
When you use the ITPRINT option in the EM statement, the "EM (MLE) Iteration History" table in Output 82.1.6 displays the iteration history for the EM algorithm.
Output 82.1.6: EM (MLE) Iteration History
| EM (MLE) Iteration History | ||||
|---|---|---|---|---|
| _Iteration_ | -2 Log L | Oxygen | RunTime | RunPulse |
| 0 | 289.544782 | 47.116179 | 10.688214 | 171.863636 |
| 1 | 263.549489 | 47.116179 | 10.688214 | 171.863636 |
| 2 | 255.851312 | 47.139089 | 10.603506 | 171.538203 |
| 3 | 254.616428 | 47.122353 | 10.571685 | 171.426790 |
| 4 | 254.494971 | 47.111080 | 10.560585 | 171.398296 |
| 5 | 254.483973 | 47.106523 | 10.556768 | 171.389208 |
| 6 | 254.482920 | 47.104899 | 10.555485 | 171.385257 |
| 7 | 254.482813 | 47.104348 | 10.555062 | 171.383345 |
| 8 | 254.482801 | 47.104165 | 10.554923 | 171.382424 |
| 9 | 254.482800 | 47.104105 | 10.554878 | 171.381992 |
| 10 | 254.482800 | 47.104086 | 10.554864 | 171.381796 |
| 11 | 254.482800 | 47.104079 | 10.554859 | 171.381708 |
| 12 | 254.482800 | 47.104077 | 10.554858 | 171.381669 |
The "EM (MLE) Parameter Estimates" table in Output 82.1.7 displays the maximum likelihood estimates for and
of a multivariate normal distribution from the data set
Fitness1.
Output 82.1.7: EM (MLE) Parameter Estimates
| EM (MLE) Parameter Estimates | ||||
|---|---|---|---|---|
| _TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
| MEAN | 47.104077 | 10.554858 | 171.381669 | |
| COV | Oxygen | 27.797931 | -6.457975 | -18.031298 |
| COV | RunTime | -6.457975 | 2.015514 | 3.516287 |
| COV | RunPulse | -18.031298 | 3.516287 | 97.766857 |
You can also output the EM (MLE) parameter estimates to an output data set with the OUTEM= option. The following statements list the observations in the output data set Outem:
proc print data=outem;
title 'EM Estimates';
run;
The output data set Outem in Output 82.1.8 is a TYPE=COV data set. The observation with _TYPE_=‘MEAN’ contains the MLE for the parameter , and the observations with
_TYPE_=‘COV’ contain the MLE for the parameter of a multivariate normal distribution from the data set
Fitness1.
Output 82.1.8: EM Estimates
| EM Estimates |
| Obs | _TYPE_ | _NAME_ | Oxygen | RunTime | RunPulse |
|---|---|---|---|---|---|
| 1 | MEAN | 47.1041 | 10.5549 | 171.382 | |
| 2 | COV | Oxygen | 27.7979 | -6.4580 | -18.031 |
| 3 | COV | RunTime | -6.4580 | 2.0155 | 3.516 |
| 4 | COV | RunPulse | -18.0313 | 3.5163 | 97.767 |