The GLM Procedure

Multivariate Analysis of Variance

If you fit several dependent variables to the same effects, you might want to make joint tests involving parameters of several dependent variables. Suppose you have p dependent variables, k parameters for each dependent variable, and n observations. The models can be collected into one equation:

bold upper Y equals bold upper X bold-italic beta plus bold-italic epsilon

where bold upper Y is n times p, bold upper X is n times k, bold-italic beta is k times p, and bold-italic epsilon is n times p. Each of the p models can be estimated and tested separately. However, you might also want to consider the joint distribution and test the p models simultaneously.

For multivariate tests, you need to make some assumptions about the errors. With p dependent variables, there are n times p errors that are independent across observations but not across dependent variables. Assume

vec left-parenthesis bold-italic epsilon right-parenthesis tilde upper N left-parenthesis bold 0 comma bold upper I Subscript n Baseline circled-times bold upper Sigma right-parenthesis

where vecleft-parenthesis bold-italic epsilon right-parenthesis strings bold-italic epsilon out by rows, circled-times denotes Kronecker product multiplication, and bold upper Sigma is p times p. bold upper Sigma can be estimated by

bold upper S equals StartFraction bold e prime bold e Over n minus r EndFraction equals StartFraction left-parenthesis bold upper Y minus bold upper X bold b right-parenthesis prime left-parenthesis bold upper Y minus bold upper X bold b right-parenthesis Over n minus r EndFraction

where bold b equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper Y, r is the rank of the bold upper X matrix, and bold e is the matrix of residuals.

If bold upper S is scaled to unit diagonals, the values in bold upper S are called partial correlations of the Ys adjusting for the Xs. This matrix can be displayed by PROC GLM if PRINTE is specified as a MANOVA option.

The multivariate general linear hypothesis is written

bold upper L bold-italic beta bold upper M equals 0

You can form hypotheses for linear combinations across columns, as well as across rows of bold-italic beta.

The MANOVA statement of the GLM procedure tests special cases where bold upper L corresponds to Type I, Type II, Type III, or Type IV tests, and bold upper M is the p times p identity matrix. These tests are joint tests that the given type of hypothesis holds for all dependent variables in the model, and they are often sufficient to test all hypotheses of interest.

Finally, when these special cases are not appropriate, you can specify your own bold upper L and bold upper M matrices by using the CONTRAST statement before the MANOVA statement and the M= specification in the MANOVA statement, respectively. Another alternative is to use a REPEATED statement, which automatically generates a variety of bold upper M matrices useful in repeated measures analysis of variance. See the section REPEATED Statement and the section Repeated Measures Analysis of Variance for more information.

One useful way to think of a MANOVA analysis with an bold upper M matrix other than the identity is as an analysis of a set of transformed variables defined by the columns of the bold upper M matrix. You should note, however, that PROC GLM always displays the bold upper M matrix in such a way that the transformed variables are defined by the rows, not the columns, of the displayed bold upper M matrix.

All multivariate tests carried out by the GLM procedure first construct the matrices bold upper H and bold upper E corresponding to the numerator and denominator, respectively, of a univariate F test:

StartLayout 1st Row 1st Column bold upper H 2nd Column equals 3rd Column bold upper M prime left-parenthesis bold upper L bold b right-parenthesis prime left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript negative 1 Baseline left-parenthesis bold upper L bold b right-parenthesis bold upper M 2nd Row 1st Column bold upper E 2nd Column equals 3rd Column bold upper M prime left-parenthesis bold upper Y prime bold upper Y minus bold b prime left-parenthesis bold upper X prime bold upper X right-parenthesis bold b right-parenthesis bold upper M EndLayout

The diagonal elements of bold upper H and bold upper E correspond to the hypothesis and error SS for univariate tests. When the bold upper M matrix is the identity matrix (the default), these tests are for the original dependent variables on the left side of the MODEL statement. When an bold upper M matrix other than the identity is specified, the tests are for transformed variables defined by the columns of the bold upper M matrix. These tests can be studied by requesting the SUMMARY option, which produces univariate analyses for each original or transformed variable.

Four multivariate test statistics, all functions of the eigenvalues of bold upper E Superscript negative 1 Baseline bold upper H (or left-parenthesis bold upper E plus bold upper H right-parenthesis Superscript negative 1 Baseline bold upper H), are constructed:

  • Wilks’ lambda = detleft-parenthesis bold upper E right-parenthesis/detleft-parenthesis bold upper H plus bold upper E right-parenthesis

  • Pillai’s trace = traceleft-parenthesis bold upper H left-parenthesis bold upper H plus bold upper E right-parenthesis Superscript negative 1 Baseline right-parenthesis

  • Hotelling-Lawley trace = traceleft-parenthesis bold upper E Superscript negative 1 Baseline bold upper H right-parenthesis

  • Roy’s greatest root = lamda, largest eigenvalue of bold upper E Superscript negative 1 Baseline bold upper H

By default, all four are reported with p-values based on F approximations, as discussed in the "Multivariate Tests" section in Chapter 4, Introduction to Regression Procedures. Alternatively, if you specify MSTAT=EXACT in the associated MANOVA or REPEATED statement, p-values for three of the four tests are computed exactly (Wilks’ lambda, the Hotelling-Lawley trace, and Roy’s greatest root), and the p-values for the fourth (Pillai’s trace) are based on an F approximation that is more accurate than the default. See the "Multivariate Tests" section in Chapter 4, Introduction to Regression Procedures, for more details on the exact calculations.

Last updated: December 09, 2022