The MI Procedure

Descriptive Statistics

Suppose bold upper Y equals left-parenthesis bold y 1 comma bold y 2 comma ellipsis comma bold y Subscript n Baseline right-parenthesis Superscript prime is the left-parenthesis n times p right-parenthesis matrix of complete data, which might not be fully observed, n 0 is the number of observations fully observed, and n Subscript j is the number of observations with observed values for variable upper Y Subscript j.

With complete cases, the sample mean vector is

bold y overbar equals StartFraction 1 Over n 0 EndFraction sigma-summation bold y Subscript i

and the CSSCP matrix is

sigma-summation left-parenthesis bold y Subscript i Baseline minus bold y overbar right-parenthesis left-parenthesis bold y Subscript i Baseline minus bold y overbar right-parenthesis Superscript prime

where each summation is over the fully observed observations.

The sample covariance matrix is

bold upper S equals StartFraction 1 Over n 0 minus 1 EndFraction sigma-summation left-parenthesis bold y Subscript i Baseline minus bold y overbar right-parenthesis left-parenthesis bold y Subscript i Baseline minus bold y overbar right-parenthesis Superscript prime

and is an unbiased estimate of the covariance matrix.

The correlation matrix bold upper R, which contains the Pearson product-moment correlations of the variables, is derived by scaling the corresponding covariance matrix:

bold upper R equals bold upper D Superscript negative 1 Baseline bold upper S bold upper D Superscript negative 1

where bold upper D is a diagonal matrix whose diagonal elements are the square roots of the diagonal elements of bold upper S.

With available cases, the corrected sum of squares for variable upper Y Subscript j is

sigma-summation left-parenthesis y Subscript j i Baseline minus y overbar Subscript j Baseline right-parenthesis squared

where y overbar Subscript j Baseline equals StartFraction 1 Over n Subscript j Baseline EndFraction sigma-summation y Subscript j i is the sample mean and each summation is over observations with observed values for variable upper Y Subscript j.

The variance is

s Subscript j j Superscript 2 Baseline equals StartFraction 1 Over n Subscript j Baseline minus 1 EndFraction sigma-summation left-parenthesis y Subscript j i Baseline minus y overbar Subscript j Baseline right-parenthesis squared

The correlations for available cases contain pairwise correlations for each pair of variables. Each correlation is computed from all observations that have nonmissing values for the corresponding pair of variables.

Last updated: December 09, 2022