Introduction to Regression Procedures

Multivariate Tests

Multivariate hypotheses involve several dependent variables in the form

upper H colon bold upper L bold-italic beta bold upper M equals bold d

where bold upper L is a linear function on the regressor side, bold-italic beta is a matrix of parameters, bold upper M is a linear function on the dependent side, and bold d is a matrix of constants. The special case (handled by PROC REG) in which the constants are the same for each dependent variable is expressed as

left-parenthesis bold upper L bold-italic beta minus bold c bold j right-parenthesis bold upper M equals bold 0

where bold c is a column vector of constants and bold j is a row vector of ones. The special case in which the constants are 0 is then

bold upper L bold-italic beta bold upper M equals bold 0

These multivariate tests are covered in detail in Morrison (2004); Timm (2002); Mardia, Kent, and Bibby (1979); Bock (1975); and other works cited in Chapter 10, Introduction to Multivariate Procedures.

Notice that in contrast to the tests discussed in the preceding section, bold-italic beta here is a matrix of parameter estimates. Suppose that the matrix of estimates is denoted as bold upper B. To test the multivariate hypothesis, construct two matrices, bold upper H and bold upper E, that correspond to the numerator and denominator of a univariate F test:

StartLayout 1st Row 1st Column bold upper H equals 2nd Column bold upper M prime left-parenthesis bold upper L bold upper B minus bold c bold j right-parenthesis prime left-parenthesis bold upper L left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript minus Baseline bold upper L prime right-parenthesis Superscript negative 1 Baseline left-parenthesis bold upper L bold upper B minus bold c bold j right-parenthesis bold upper M 2nd Row 1st Column bold upper E equals 2nd Column bold upper M prime left-parenthesis bold upper Y prime bold upper W bold upper Y minus bold upper B prime left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis bold upper B right-parenthesis bold upper M EndLayout

Four test statistics, based on the eigenvalues of bold upper E Superscript negative 1 Baseline bold upper H or left-parenthesis bold upper E plus bold upper H right-parenthesis Superscript negative 1 Baseline bold upper H, are formed. Let lamda Subscript i be the ordered eigenvalues of bold upper E Superscript negative 1 Baseline bold upper H (if the inverse exists), and let xi Subscript i be the ordered eigenvalues of left-parenthesis bold upper E plus bold upper H right-parenthesis Superscript negative 1 Baseline bold upper H. It happens that xi Subscript i Baseline equals lamda Subscript i Baseline slash left-parenthesis 1 plus lamda Subscript i Baseline right-parenthesis and lamda Subscript i Baseline equals xi Subscript i Baseline slash left-parenthesis 1 minus xi Subscript i Baseline right-parenthesis, and it turns out that rho Subscript i Baseline equals StartRoot xi Subscript i Baseline EndRoot is the ith canonical correlation.

Let p be the rank of left-parenthesis bold upper H plus bold upper E right-parenthesis, which is less than or equal to the number of columns of bold upper M. Let q be the rank of bold upper L left-parenthesis bold upper X prime bold upper W bold upper X right-parenthesis Superscript minus Baseline bold upper L prime. Let v be the error degrees of freedom, and let s equals min left-parenthesis p comma q right-parenthesis. Let m equals left-parenthesis StartAbsoluteValue p minus q EndAbsoluteValue minus 1 right-parenthesis slash 2, and let n equals left-parenthesis v minus p minus 1 right-parenthesis slash 2. Then the following statistics test the multivariate hypothesis in various ways, and their p-values can be approximated by F distributions. Note that in the special case that the rank of bold upper H is 1, all these F statistics are the same and the corresponding p-values are exact, because in this case the hypothesis is really univariate.

Wilks’ Lambda

If

normal upper Lamda equals StartFraction normal d normal e normal t left-parenthesis bold upper E right-parenthesis Over normal d normal e normal t left-parenthesis bold upper H plus bold upper E right-parenthesis EndFraction equals product Underscript i equals 1 Overscript n Endscripts StartFraction 1 Over 1 plus lamda Subscript i Baseline EndFraction equals product Underscript i equals 1 Overscript n Endscripts left-parenthesis 1 minus xi Subscript i Baseline right-parenthesis

then

upper F equals StartFraction 1 minus normal upper Lamda Superscript 1 slash t Baseline Over normal upper Lamda Superscript 1 slash t Baseline EndFraction dot StartFraction r t minus 2 u Over p q EndFraction

is approximately F distributed, where

StartLayout 1st Row 1st Column r equals 2nd Column v minus StartFraction p minus q plus 1 Over 2 EndFraction 2nd Row 1st Column u equals 2nd Column StartFraction p q minus 2 Over 4 EndFraction 3rd Row 1st Column t equals 2nd Column StartLayout Enlarged left-brace 1st Row 1st Column StartRoot StartFraction p squared q squared minus 4 Over p squared plus q squared minus 5 EndFraction EndRoot 2nd Column Blank 3rd Column normal i normal f p squared plus q squared minus 5 greater-than 0 2nd Row 1st Column 1 2nd Column Blank 3rd Column normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout EndLayout

The degrees of freedom are p q and r t minus 2 u. The distribution is exact if min left-parenthesis p comma q right-parenthesis less-than-or-equal-to 2. (See Rao 1973, p. 556.)

Pillai’s Trace

If

upper V equals normal t normal r normal a normal c normal e left-parenthesis bold upper H left-parenthesis bold upper H plus bold upper E right-parenthesis Superscript negative 1 Baseline right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction lamda Subscript i Baseline Over 1 plus lamda Subscript i Baseline EndFraction equals sigma-summation Underscript i equals 1 Overscript n Endscripts xi Subscript i

then

upper F equals StartFraction 2 n plus s plus 1 Over 2 m plus s plus 1 EndFraction dot StartFraction upper V Over s minus upper V EndFraction

is approximately F distributed with s left-parenthesis 2 m plus s plus 1 right-parenthesis and s left-parenthesis 2 n plus s plus 1 right-parenthesis degrees of freedom.

Hotelling-Lawley Trace

If

upper U equals normal t normal r normal a normal c normal e left-parenthesis bold upper E Superscript negative 1 Baseline bold upper H right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts lamda Subscript i Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction xi Subscript i Baseline Over 1 minus xi Subscript i Baseline EndFraction

then for n greater-than 0

upper F equals left-parenthesis upper U slash c right-parenthesis left-parenthesis left-parenthesis 4 plus left-parenthesis p q plus 2 right-parenthesis slash left-parenthesis b minus 1 right-parenthesis right-parenthesis slash left-parenthesis p q right-parenthesis right-parenthesis

is approximately F distributed with p q and 4 plus left-parenthesis p q plus 2 right-parenthesis slash left-parenthesis b minus 1 right-parenthesis degrees of freedom, where b equals left-parenthesis p plus 2 n right-parenthesis left-parenthesis q plus 2 n right-parenthesis slash left-parenthesis 2 left-parenthesis 2 n plus 1 right-parenthesis left-parenthesis n minus 1 right-parenthesis right-parenthesis and c equals left-parenthesis 2 plus left-parenthesis p q plus 2 right-parenthesis slash left-parenthesis b minus 1 right-parenthesis right-parenthesis slash left-parenthesis 2 n right-parenthesis; while for n less-than-or-equal-to 0

upper F equals StartFraction 2 left-parenthesis s n plus 1 right-parenthesis upper U Over s squared left-parenthesis 2 m plus s plus 1 right-parenthesis EndFraction

is approximately F with s left-parenthesis 2 m plus s plus 1 right-parenthesis and 2 left-parenthesis s n plus 1 right-parenthesis degrees of freedom.

Roy’s Maximum Root

If normal upper Theta equals lamda 1, then

upper F equals normal upper Theta StartFraction v minus r plus q Over r EndFraction

where r equals max left-parenthesis p comma q right-parenthesis is an upper bound on F that yields a lower bound on the significance level. Degrees of freedom are r for the numerator and v minus r plus q for the denominator.

Tables of critical values for these statistics are found in Pillai (1960).

Exact Multivariate Tests

If you specify the MSTAT=EXACT option in the appropriate statement, p-values for three of the four tests (Wilks’ lambda, the Hotelling-Lawley trace, and Roy’s greatest root) are computed exactly, and the p-values for the fourth test (Pillai’s trace) are based on an F approximation that is more accurate (but occasionally slightly more liberal) than the default. The exact p-values for Roy’s greatest root benefit the most, because in this case the F approximation provides only a lower bound for the p-value. If you use the F-based p-value for this test in the usual way, declaring a test significant if p < 0.05, then your decisions might be very liberal. For example, instead of the nominal 5% Type I error rate, such a procedure can easily have an actual Type I error rate in excess of 30%. By contrast, basing such a procedure on the exact p-values results in the appropriate 5% Type I error rate, under the usual regression assumptions.

The MSTAT=EXACT option is supported in the ANOVA, CANCORR, CANDISC, GLM, and REG procedures.

The exact p-values are based on the following sources:

  • Wilks’ lambda: Lee (1972); Davis (1979)

  • Pillai’s trace: Muller (1998)

  • Hotelling-Lawley trace: Davis (1970, 1980)

  • Roy’s greatest root: Davis (1972); Pillai and Flury (1984)

Note that, although the MSTAT=EXACT p-value for Pillai’s trace is still approximate, it has "substantially greater accuracy" than the default approximation (Muller 1998).

Because most of the MSTAT=EXACT p-values are not based on the F distribution, the columns in the multivariate tests table that correspond to this approximation—in particular, the F value and the numerator and denominator degrees of freedom—are no longer displayed, and the column that contains the p-values is labeled "P Value" instead of "Pr > F." Suppose, for example, that you use the following PROC ANOVA statements to perform a multivariate analysis of an archaeological data set:

data Skulls;
   input Loc $20. Basal Occ Max;
   datalines;
Minas Graes, Brazil  2.068 2.070 1.580
Minas Graes, Brazil  2.068 2.074 1.602
Minas Graes, Brazil  2.090 2.090 1.613
Minas Graes, Brazil  2.097 2.093 1.613
Minas Graes, Brazil  2.117 2.125 1.663
Minas Graes, Brazil  2.140 2.146 1.681
Matto Grosso, Brazil 2.045 2.054 1.580
Matto Grosso, Brazil 2.076 2.088 1.602
Matto Grosso, Brazil 2.090 2.093 1.643
Matto Grosso, Brazil 2.111 2.114 1.643
Santa Cruz, Bolivia  2.093 2.098 1.653
Santa Cruz, Bolivia  2.100 2.106 1.623
Santa Cruz, Bolivia  2.104 2.101 1.653
;
proc anova data=Skulls;
   class Loc;
   model Basal Occ Max = Loc / nouni;
   manova h=Loc;
   ods select MultStat;
run;

The default multivariate tests, based on the F approximations, are shown in Figure 5

Figure 5: Default Multivariate Tests

The ANOVA Procedure
Multivariate Analysis of Variance

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Loc Effect
H = Anova SSCP Matrix for Loc
E = Error SSCP Matrix

S=2 M=0 N=3
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.60143661 0.77 6 16 0.6032
Pillai's Trace 0.44702843 0.86 6 18 0.5397
Hotelling-Lawley Trace 0.58210348 0.75 6 9.0909 0.6272
Roy's Greatest Root 0.35530890 1.07 3 9 0.4109
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.


If you specify MSTAT=EXACT in the MANOVA statement, as in the following statements, then the displayed output is the much simpler table shown in Figure 6:

proc anova data=Skulls;
   class Loc;
   model Basal Occ Max = Loc / nouni;
   manova h=Loc / mstat=exact;
   ods select MultStat;
run;

Figure 6: Multivariate Tests with MSTAT=EXACT

The ANOVA Procedure
Multivariate Analysis of Variance

MANOVA Tests for the Hypothesis of No Overall Loc Effect
H = Anova SSCP Matrix for Loc
E = Error SSCP Matrix

S=2 M=0 N=3
Statistic Value P-Value
Wilks' Lambda 0.60143661 0.6032
Pillai's Trace 0.44702843 0.5521
Hotelling-Lawley Trace 0.58210348 0.6337
Roy's Greatest Root 0.35530890 0.7641


Notice that the p-value for Roy’s greatest root is substantially larger in the new table and correspondingly more in line with the p-values for the other tests.

If you reference the underlying ODS output object for the table of multivariate statistics, it is important to note that its structure does not depend on the value of the MSTAT= option. In particular, it always contains columns that correspond to both the default MSTAT=FAPPROX and the MSTAT=EXACT tests. Moreover, because the MSTAT=FAPPROX tests are relatively cheap to compute, the columns that correspond to them are always filled in, even though they are not displayed when you specify MSTAT=EXACT. On the other hand, for MSTAT=FAPPROX (which is the default), the column of exact p-values contains missing values and is not displayed.

Last updated: December 09, 2022