The GLM Procedure

Specification of ESTIMATE Expressions

Consider the model

StartLayout 1st Row 1st Column upper E left-parenthesis upper Y right-parenthesis 2nd Column equals 3rd Column beta 0 plus beta 1 x 1 plus beta 2 x 2 plus beta 3 x 3 EndLayout

The corresponding MODEL statement for PROC GLM is

model y=x1 x2 x3;

To estimate the difference between the parameters for x 1 and x 2,

StartLayout 1st Row 1st Column beta 1 minus beta 2 2nd Column equals 3rd Column Start 1 By 4 Matrix 1st Row 1st Column 0 2nd Column 1 3rd Column negative 1 4th Column 0 EndMatrix bold-italic beta comma where bold-italic beta equals left-parenthesis StartLayout 1st Row 1st Column beta 0 2nd Column beta 1 3rd Column beta 2 4th Column beta 3 EndLayout right-parenthesis prime EndLayout

you can use the following ESTIMATE statement:

estimate 'B1-B2'  x1 1  x2 -1;

To predict y at x 1 equals 1, x 2 equals 0, and x 3 equals negative 2, you can estimate

StartLayout 1st Row 1st Column beta 0 plus beta 1 minus 2 beta 3 2nd Column equals 3rd Column Start 1 By 4 Matrix 1st Row 1st Column 1 2nd Column 1 3rd Column 0 4th Column negative 2 EndMatrix bold-italic beta EndLayout

with the following ESTIMATE statement:

estimate 'B0+B1-2B3' intercept 1 x1 1 x3 -2;

Now consider models involving classification variables such as

model y=A B A*B;

with the associated parameters:

Start 1 By 12 Matrix 1st Row 1st Column mu 2nd Column alpha 1 3rd Column alpha 2 4th Column alpha 3 5th Column beta 1 6th Column beta 2 7th Column gamma 11 8th Column gamma 12 9th Column gamma 21 10th Column gamma 22 11th Column gamma 31 12th Column gamma 32 EndMatrix

The LS-mean for the first level of A is bold upper L bold-italic beta, where

bold upper L equals Start 1 By 15 Matrix 1st Row 1st Column 1 2nd Column vertical-bar 3rd Column 1 4th Column 0 5th Column 0 6th Column vertical-bar 7th Column 0.5 8th Column 0.5 9th Column vertical-bar 10th Column 0.5 11th Column 0.5 12th Column 0 13th Column 0 14th Column 0 15th Column 0 EndMatrix

You can estimate this with the following ESTIMATE statement:

estimate 'LS-mean(A1)' intercept 1 A 1 B 0.5 0.5 A*B 0.5 0.5;

Note in this statement that only one element of bold upper L is specified following the A effect, even though A has three levels. Whenever the list of constants following an effect name is shorter than the effect’s number of levels, zeros are used as the remaining constants. (If the list of constants is longer than the number of levels for the effect, the extra constants are ignored, and a warning message is displayed.)

To estimate the A linear effect in the preceding model, assuming equally spaced levels for A, you can use the following bold upper L:

bold upper L equals Start 1 By 15 Matrix 1st Row 1st Column 0 2nd Column vertical-bar 3rd Column negative 1 4th Column 0 5th Column 1 6th Column vertical-bar 7th Column 0 8th Column 0 9th Column vertical-bar 10th Column negative 0.5 11th Column negative 0.5 12th Column 0 13th Column 0 14th Column 0.5 15th Column 0.5 EndMatrix

The ESTIMATE statement for this bold upper L is written as

     estimate 'A Linear' A -1 0 1;

If you do not specify the elements of bold upper L for an effect that contains a specified effect, then the elements of the specified effect are equally distributed over the corresponding levels of the higher-order effect. In addition, if you specify the intercept in an ESTIMATE or CONTRAST statement, it is distributed over all classification effects that are not contained by any other specified effect.

The distribution of lower-order coefficients to higher-order effect coefficients follows the same general rules as in the LSMEANS statement, and it is similar to that used to construct Type IV tests. In the previous example, the –1 associated with alpha 1 is divided by the number n Subscript 1 j of gamma Subscript 1 j parameters; then each gamma Subscript 1 j coefficient is set to negative 1 slash n Subscript 1 j. The 1 associated with alpha 3 is distributed among the gamma Subscript 3 j parameters in a similar fashion. In the event that an unspecified effect contains several specified effects, only that specified effect with the most factors in common with the unspecified effect is used for distribution of coefficients to the higher-order effect.

Numerous syntactical expressions for the ESTIMATE statement were considered, including many that involved specifying the effect and level information associated with each coefficient. For models involving higher-level effects, the requirement of specifying level information can lead to very bulky specifications. Consequently, the simpler form of the ESTIMATE statement described earlier was implemented.

The syntax of this ESTIMATE statement puts a burden on you to know a priori the order of the parameter list associated with each effect. You can use the ORDER= option in the PROC GLM statement to ensure that the levels of the classification effects are sorted appropriately.

Note: If you use the ESTIMATE statement with unspecified effects, use the E option to make sure that the actual bold upper L constructed by the preceding rules is the one you intended.

A Check for Estimability

Each bold upper L is checked for estimability using the relationship bold upper L equals bold upper L bold upper H, where bold upper H equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper X. The bold upper L vector is declared nonestimable, if for any i

ABS left-parenthesis bold upper L Subscript i Baseline minus left-parenthesis bold upper L bold upper H right-parenthesis Subscript i Baseline right-parenthesis greater-than StartLayout Enlarged left-brace 1st Row 1st Column epsilon 2nd Column Blank 3rd Column if bold upper L Subscript i Baseline equals 0 or 2nd Row 1st Column epsilon times ABS left-parenthesis bold upper L Subscript i Baseline right-parenthesis 2nd Column Blank 3rd Column otherwise EndLayout

where epsilon equals 10 Superscript negative 4 by default; you can change this with the SINGULAR= option. Continued fractions (like 1/3) should be specified to at least six decimal places, or the DIVISOR parameter should be used.

Last updated: December 09, 2022