The MIXED Procedure

Mixed Models Theory

This section provides an overview of a likelihood-based approach to general linear mixed models. This approach simplifies and unifies many common statistical analyses, including those involving repeated measures, random effects, and random coefficients. The basic assumption is that the data are linearly related to unobserved multivariate normal random variables. For extensions to nonlinear and nonnormal situations see the documentation of the GLIMMIX and NLMIXED procedures. Additional theory and examples are provided in Littell et al. (2006); Verbeke and Molenberghs (1997, 2000); Brown and Prescott (1999).

Matrix Notation

Suppose that you observe n data points y 1 comma ellipsis comma y Subscript n Baseline and that you want to explain them by using n values for each of p explanatory variables x 11 comma ellipsis comma x Subscript 1 p Baseline, x 21 comma ellipsis comma x Subscript 2 p Baseline, ellipsis comma x Subscript n Baseline 1 Baseline comma ellipsis comma x Subscript n p Baseline. The x Subscript i j values can be either regression-type continuous variables or dummy variables indicating class membership. The standard linear model for this setup is

y Subscript i Baseline equals sigma-summation Underscript j equals 1 Overscript p Endscripts x Subscript i j Baseline beta Subscript j Baseline plus epsilon Subscript i Baseline i equals 1 comma ellipsis comma n

where beta 1 comma ellipsis comma beta Subscript p Baseline are unknown fixed-effects parameters to be estimated and epsilon 1 comma ellipsis comma epsilon Subscript n Baseline are unknown independent and identically distributed normal (Gaussian) random variables with mean 0 and variance sigma squared.

The preceding equations can be written simultaneously by using vectors and a matrix, as follows:

Start 4 By 1 Matrix 1st Row  y 1 2nd Row  y 2 3rd Row  vertical-ellipsis 4th Row  y Subscript n Baseline EndMatrix equals Start 4 By 4 Matrix 1st Row 1st Column x 11 2nd Column x 12 3rd Column ellipsis 4th Column x Subscript 1 p Baseline 2nd Row 1st Column x 21 2nd Column x 22 3rd Column ellipsis 4th Column x Subscript 2 p Baseline 3rd Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column Blank 4th Column vertical-ellipsis 4th Row 1st Column x Subscript n Baseline 1 Baseline 2nd Column x Subscript n Baseline 2 Baseline 3rd Column ellipsis 4th Column x Subscript n p Baseline EndMatrix Start 4 By 1 Matrix 1st Row  beta 1 2nd Row  beta 2 3rd Row  vertical-ellipsis 4th Row  beta Subscript p Baseline EndMatrix plus Start 4 By 1 Matrix 1st Row  epsilon 1 2nd Row  epsilon 2 3rd Row  vertical-ellipsis 4th Row  epsilon Subscript n EndMatrix

For convenience, simplicity, and extendability, this entire system is written as

bold y equals bold upper X bold-italic beta plus bold-italic epsilon

where bold y denotes the vector of observed y Subscript i’s, bold upper X is the known matrix of x Subscript i j’s, bold-italic beta is the unknown fixed-effects parameter vector, and bold-italic epsilon is the unobserved vector of independent and identically distributed Gaussian random errors.

In addition to denoting data, random variables, and explanatory variables in the preceding fashion, the subsequent development makes use of basic matrix operators such as transpose (prime), inverse (Superscript negative 1), generalized inverse (Superscript minus), determinant (StartAbsoluteValue dot EndAbsoluteValue), and matrix multiplication. See Searle (1982) for details about these and other matrix techniques.

Formulation of the Mixed Model

The previous general linear model is certainly a useful one (Searle 1971), and it is the one fitted by the GLM procedure. However, many times the distributional assumption about epsilon is too restrictive. The mixed model extends the general linear model by allowing a more flexible specification of the covariance matrix of epsilon. In other words, it allows for both correlation and heterogeneous variances, although you still assume normality.

The mixed model is written as

bold y equals bold upper X bold-italic beta plus bold upper Z bold-italic gamma plus bold-italic epsilon

where everything is the same as in the general linear model except for the addition of the known design matrix, bold upper Z, and the vector of unknown random-effects parameters, bold-italic gamma. The matrix bold upper Z can contain either continuous or dummy variables, just like bold upper X. The name mixed model comes from the fact that the model contains both fixed-effects parameters, bold-italic beta, and random-effects parameters, bold-italic gamma. See Henderson (1990) and Searle, Casella, and McCulloch (1992) for historical developments of the mixed model.

A key assumption in the foregoing analysis is that bold-italic gamma and bold-italic epsilon are normally distributed with

StartLayout 1st Row 1st Column normal upper E StartBinomialOrMatrix bold-italic gamma Choose bold-italic epsilon EndBinomialOrMatrix 2nd Column equals StartBinomialOrMatrix bold 0 Choose bold 0 EndBinomialOrMatrix 2nd Row 1st Column normal upper V normal a normal r StartBinomialOrMatrix bold-italic gamma Choose bold-italic epsilon EndBinomialOrMatrix 2nd Column equals Start 2 By 2 Matrix 1st Row 1st Column bold upper G 2nd Column bold 0 2nd Row 1st Column bold 0 2nd Column bold upper R EndMatrix EndLayout

The variance of bold y is, therefore, bold upper V equals bold upper Z bold upper G bold upper Z prime plus bold upper R. You can model bold upper V by setting up the random-effects design matrix bold upper Z and by specifying covariance structures for bold upper G and bold upper R.

Note that this is a general specification of the mixed model, in contrast to many texts and articles that discuss only simple random effects. Simple random effects are a special case of the general specification with bold upper Z containing dummy variables, bold upper G containing variance components in a diagonal structure, and bold upper R equals sigma squared bold upper I Subscript n, where bold upper I Subscript n denotes the n times n identity matrix. The general linear model is a further special case with bold upper Z equals bold 0 and bold upper R equals sigma squared bold upper I Subscript n.

The following two examples illustrate the most common formulations of the general linear mixed model.

Example: Growth Curve with Compound Symmetry

Suppose that you have three growth curve measurements for s individuals and that you want to fit an overall linear trend in time. Your bold upper X matrix is as follows:

bold upper X equals Start 7 By 2 Matrix 1st Row 1st Column 1 2nd Column 1 2nd Row 1st Column 1 2nd Column 2 3rd Row 1st Column 1 2nd Column 3 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 5th Row 1st Column 1 2nd Column 1 6th Row 1st Column 1 2nd Column 2 7th Row 1st Column 1 2nd Column 3 EndMatrix

The first column (coded entirely with 1s) fits an intercept, and the second column (coded with times of 1 comma 2 comma 3) fits a slope. Here, n equals 3 s and p equals 2.

Suppose further that you want to introduce a common correlation among the observations from a single individual, with correlation being the same for all individuals. One way of setting this up in the general mixed model is to eliminate the bold upper Z and bold upper G matrices and let the bold upper R matrix be block diagonal with blocks corresponding to the individuals and with each block having the compound-symmetry structure. This structure has two unknown parameters, one modeling a common covariance and the other modeling a residual variance. The form for bold upper R would then be as follows:

bold upper R equals Start 7 By 7 Matrix 1st Row 1st Column sigma 1 squared plus sigma squared 2nd Column sigma 1 squared 3rd Column sigma 1 squared 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 2nd Row 1st Column sigma 1 squared 2nd Column sigma 1 squared plus sigma squared 3rd Column sigma 1 squared 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 3rd Row 1st Column sigma 1 squared 2nd Column sigma 1 squared 3rd Column sigma 1 squared plus sigma squared 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 4th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column down-right-diagonal-ellipsis 5th Column Blank 6th Column Blank 7th Column Blank 5th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column sigma 1 squared plus sigma squared 6th Column sigma 1 squared 7th Column sigma 1 squared 6th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column sigma 1 squared 6th Column sigma 1 squared plus sigma squared 7th Column sigma 1 squared 7th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column sigma 1 squared 6th Column sigma 1 squared 7th Column sigma 1 squared plus sigma squared EndMatrix

where blanks denote zeros. There are 3 s rows and columns altogether, and the common correlation is sigma 1 squared slash left-parenthesis sigma 1 squared plus sigma squared right-parenthesis.

The PROC MIXED statements to fit this model are as follows:

proc mixed;
   class indiv;
   model y = time;
   repeated / type=cs subject=indiv;
run;

Here, indiv is a classification variable indexing individuals. The MODEL statement fits a straight line for time ; the intercept is fit by default just as in PROC GLM. The REPEATED statement models the bold upper R matrix: TYPE=CS specifies the compound symmetry structure, and SUBJECT=INDIV specifies the blocks of bold upper R.

An alternative way of specifying the common intra-individual correlation is to let

StartLayout 1st Row 1st Column bold upper Z 2nd Column equals Start 10 By 4 Matrix 1st Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 2nd Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 3rd Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 4th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 5th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 6th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 7th Row 1st Column Blank 2nd Column Blank 3rd Column down-right-diagonal-ellipsis 4th Column Blank 8th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 9th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 10th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 EndMatrix 2nd Row 1st Column bold upper G 2nd Column equals Start 4 By 4 Matrix 1st Row 1st Column sigma 1 squared 2nd Column Blank 3rd Column Blank 4th Column Blank 2nd Row 1st Column Blank 2nd Column sigma 1 squared 3rd Column Blank 4th Column Blank 3rd Row 1st Column Blank 2nd Column Blank 3rd Column down-right-diagonal-ellipsis 4th Column Blank 4th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column sigma 1 squared EndMatrix EndLayout

and bold upper R equals sigma squared bold upper I Subscript n. The bold upper Z matrix has 3 s rows and s columns, and bold upper G is s times s.

You can set up this model in PROC MIXED in two different but equivalent ways:

proc mixed;
   class indiv;
   model y = time;
   random indiv;
run;

proc mixed;
   class indiv;
   model y = time;
   random intercept / subject=indiv;
run;

Both of these specifications fit the same model as the previous one that used the REPEATED statement; however, the RANDOM specifications constrain the correlation to be positive, whereas the REPEATED specification leaves the correlation unconstrained.

Example: Split-Plot Design

The split-plot design involves two experimental treatment factors, A and B, and two different sizes of experimental units to which they are applied (see Winer 1971; Snedecor and Cochran 1980; Milliken and Johnson 1992; Steel, Torrie, and Dickey 1997). The levels of A are randomly assigned to the larger-sized experimental unit, called whole plots, whereas the levels of B are assigned to the smaller-sized experimental unit, the subplots. The subplots are assumed to be nested within the whole plots, so that a whole plot consists of a cluster of subplots and a level of A is applied to the entire cluster.

Such an arrangement is often necessary by nature of the experiment, the classical example being the application of fertilizer to large plots of land and different crop varieties planted in subdivisions of the large plots. For this example, fertilizer is the whole-plot factor A and variety is the subplot factor B.

The first example is a split-plot design for which the whole plots are arranged in a randomized block design. The appropriate PROC MIXED statements are as follows:

proc mixed;
   class a b block;
   model y = a|b;
   random block a*block;
run;

Here

bold upper R equals sigma squared bold upper I 24

and bold upper X, bold upper Z, and bold upper G have the following form:

bold upper X equals Start 13 By 12 Matrix 1st Row 1st Column 1 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column 1 6th Column Blank 7th Column 1 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 2nd Row 1st Column 1 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column 1 7th Column Blank 8th Column 1 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 3rd Row 1st Column 1 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column 1 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column Blank 11th Column Blank 12th Column Blank 4th Row 1st Column 1 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column 1 7th Column Blank 8th Column Blank 9th Column Blank 10th Column 1 11th Column Blank 12th Column Blank 5th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column 1 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column 1 12th Column Blank 6th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column 1 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column 1 7th Row 1st Column vertical-ellipsis 2nd Column Blank 3rd Column vertical-ellipsis 4th Column Blank 5th Column vertical-ellipsis 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column vertical-ellipsis 8th Row 1st Column 1 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column 1 6th Column Blank 7th Column 1 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 9th Row 1st Column 1 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column 1 7th Column Blank 8th Column 1 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 10th Row 1st Column 1 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column 1 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column Blank 11th Column Blank 12th Column Blank 11th Row 1st Column 1 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column 1 7th Column Blank 8th Column Blank 9th Column Blank 10th Column 1 11th Column Blank 12th Column Blank 12th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column 1 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column 1 12th Column Blank 13th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column 1 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column 1 EndMatrix
bold upper Z equals Start 24 By 16 Matrix 1st Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column 1 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 2nd Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column 1 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 3rd Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column 1 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 4th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column 1 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 5th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column 1 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 6th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column 1 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 7th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column 1 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 8th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column 1 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 9th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 10th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 11th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column 1 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 12th Row 1st Column Blank 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column 1 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 13th Row 1st Column Blank 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column 1 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 14th Row 1st Column Blank 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column 1 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 15th Row 1st Column Blank 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column 1 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 16th Row 1st Column Blank 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column 1 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 17th Row 1st Column Blank 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column 1 14th Column Blank 15th Column Blank 16th Column Blank 18th Row 1st Column Blank 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column 1 14th Column Blank 15th Column Blank 16th Column Blank 19th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column 1 15th Column Blank 16th Column Blank 20th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column 1 15th Column Blank 16th Column Blank 21st Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column 1 16th Column Blank 22nd Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column 1 16th Column Blank 23rd Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column 1 24th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column 1 EndMatrix
bold upper G equals Start 8 By 8 Matrix 1st Row 1st Column sigma Subscript upper B Superscript 2 Baseline 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 2nd Row 1st Column Blank 2nd Column sigma Subscript upper B Superscript 2 Baseline 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 3rd Row 1st Column Blank 2nd Column Blank 3rd Column sigma Subscript upper B Superscript 2 Baseline 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 4th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column sigma Subscript upper B Superscript 2 Baseline 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 5th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column sigma Subscript upper A upper B Superscript 2 Baseline 6th Column Blank 7th Column Blank 8th Column Blank 6th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column sigma Subscript upper A upper B Superscript 2 Baseline 7th Column Blank 8th Column Blank 7th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column down-right-diagonal-ellipsis 8th Column Blank 8th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column sigma Subscript upper A upper B Superscript 2 EndMatrix

where sigma Subscript upper B Superscript 2 is the variance component for Block and sigma Subscript upper A upper B Superscript 2 is the variance component for A*Block. Changing the RANDOM statement as follows fits the same model, but with bold upper Z and bold upper G sorted differently:

random int a / subject=block;
StartLayout 1st Row 1st Column bold upper Z 2nd Column equals Start 24 By 16 Matrix 1st Row 1st Column 1 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 2nd Row 1st Column 1 2nd Column 1 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 3rd Row 1st Column 1 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 4th Row 1st Column 1 2nd Column Blank 3rd Column 1 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 5th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 6th Row 1st Column 1 2nd Column Blank 3rd Column Blank 4th Column 1 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 7th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column 1 6th Column 1 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 8th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column 1 6th Column 1 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 9th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column 1 6th Column Blank 7th Column 1 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 10th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column 1 6th Column Blank 7th Column 1 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 11th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column 1 6th Column Blank 7th Column Blank 8th Column 1 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 12th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column 1 6th Column Blank 7th Column Blank 8th Column 1 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 13th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column 1 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 14th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column 1 11th Column Blank 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 15th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column Blank 11th Column 1 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 16th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column Blank 11th Column 1 12th Column Blank 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 17th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column Blank 11th Column Blank 12th Column 1 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 18th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column 1 10th Column Blank 11th Column Blank 12th Column 1 13th Column Blank 14th Column Blank 15th Column Blank 16th Column Blank 19th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column 1 14th Column 1 15th Column Blank 16th Column Blank 20th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column 1 14th Column 1 15th Column Blank 16th Column Blank 21st Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column 1 14th Column Blank 15th Column 1 16th Column Blank 22nd Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column 1 14th Column Blank 15th Column 1 16th Column Blank 23rd Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column 1 14th Column Blank 15th Column Blank 16th Column 1 24th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 11th Column Blank 12th Column Blank 13th Column 1 14th Column Blank 15th Column Blank 16th Column 1 EndMatrix 2nd Row 1st Column bold upper G 2nd Column equals Start 9 By 9 Matrix 1st Row 1st Column sigma Subscript upper B Superscript 2 Baseline 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 2nd Row 1st Column Blank 2nd Column sigma Subscript upper A upper B Superscript 2 Baseline 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 3rd Row 1st Column Blank 2nd Column Blank 3rd Column sigma Subscript upper A upper B Superscript 2 Baseline 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 4th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column sigma Subscript upper A upper B Superscript 2 Baseline 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 5th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column down-right-diagonal-ellipsis 6th Column Blank 7th Column Blank 8th Column Blank 9th Column Blank 10th Column Blank 6th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column sigma Subscript upper B Superscript 2 Baseline 7th Column Blank 8th Column Blank 9th Column Blank 7th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column sigma Subscript upper A upper B Superscript 2 Baseline 8th Column Blank 9th Column Blank 8th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column sigma Subscript upper A upper B Superscript 2 Baseline 9th Column Blank 9th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column Blank 5th Column Blank 6th Column Blank 7th Column Blank 8th Column Blank 9th Column sigma Subscript upper A upper B Superscript 2 EndMatrix EndLayout

Estimating Covariance Parameters in the Mixed Model

Estimation is more difficult in the mixed model than in the general linear model. Not only do you have bold-italic beta as in the general linear model, but you have unknown parameters in bold-italic gamma, bold upper G, and bold upper R as well. Least squares is no longer the best method. Generalized least squares (GLS) is more appropriate, minimizing

left-parenthesis bold y minus bold upper X bold-italic beta right-parenthesis prime bold upper V Superscript negative 1 Baseline left-parenthesis bold y minus bold upper X bold-italic beta right-parenthesis

However, it requires knowledge of bold upper V and, therefore, knowledge of bold upper G and bold upper R. Lacking such information, one approach is to use estimated GLS, in which you insert some reasonable estimate for bold upper V into the minimization problem. The goal thus becomes finding a reasonable estimate of bold upper G and bold upper R.

In many situations, the best approach is to use likelihood-based methods, exploiting the assumption that bold-italic gamma and bold-italic epsilon are normally distributed (Hartley and Rao 1967; Patterson and Thompson 1971; Harville 1977; Laird and Ware 1982; Jennrich and Schluchter 1986). PROC MIXED implements two likelihood-based methods: maximum likelihood (ML) and restricted/residual maximum likelihood (REML). A favorable theoretical property of ML and REML is that they accommodate data that are missing at random (Rubin 1976; Little 1995).

PROC MIXED constructs an objective function associated with ML or REML and maximizes it over all unknown parameters. Using calculus, it is possible to reduce this maximization problem to one over only the parameters in bold upper G and bold upper R. The corresponding log-likelihood functions are as follows:

StartLayout 1st Row 1st Column normal upper M normal upper L colon l left-parenthesis bold upper G comma bold upper R right-parenthesis 2nd Column equals minus one-half log StartAbsoluteValue bold upper V EndAbsoluteValue minus one-half bold r prime bold upper V Superscript negative 1 Baseline bold r minus StartFraction n Over 2 EndFraction log left-parenthesis 2 pi right-parenthesis 2nd Row 1st Column normal upper R normal upper E normal upper M normal upper L colon l Subscript upper R Baseline left-parenthesis bold upper G comma bold upper R right-parenthesis 2nd Column equals minus one-half log StartAbsoluteValue bold upper V EndAbsoluteValue minus one-half log StartAbsoluteValue bold upper X prime bold upper V Superscript negative 1 Baseline bold upper X EndAbsoluteValue minus one-half bold r prime bold upper V Superscript negative 1 Baseline bold r minus StartFraction n minus p Over 2 EndFraction log left-parenthesis 2 pi right-parenthesis right-brace EndLayout

where bold r equals bold y minus bold upper X left-parenthesis bold upper X prime bold upper V Superscript negative 1 Baseline bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold upper V Superscript negative 1 Baseline bold y and p is the rank of bold upper X. PROC MIXED actually minimizes –2 times these functions by using a ridge-stabilized Newton-Raphson algorithm. Lindstrom and Bates (1988) provide reasons for preferring Newton-Raphson to the Expectation-Maximum (EM) algorithm (Dempster, Laird, and Rubin 1977; Laird, Lange, and Stram 1987), as well as analytical details for implementing a QR-decomposition approach to the problem. Wolfinger, Tobias, and Sall (1994) present the sweep-based algorithms that are implemented in PROC MIXED.

One advantage of using the Newton-Raphson algorithm is that the second derivative matrix of the objective function evaluated at the optima is available upon completion. Denoting this matrix bold upper H, the asymptotic theory of maximum likelihood (see Serfling 1980) shows that 2 bold upper H Superscript negative 1 is an asymptotic variance-covariance matrix of the estimated parameters of bold upper G and bold upper R. Thus, tests and confidence intervals based on asymptotic normality can be obtained. However, these can be unreliable in small samples, especially for parameters such as variance components that have sampling distributions that tend to be skewed to the right.

If a residual variance sigma squared is a part of your mixed model, it can usually be profiled out of the likelihood. This means solving analytically for the optimal sigma squared and plugging this expression back into the likelihood formula (see Wolfinger, Tobias, and Sall 1994). This reduces the number of optimization parameters by one and can improve convergence properties. PROC MIXED profiles the residual variance out of the log likelihood whenever it appears reasonable to do so. This includes the case when bold upper R equals sigma squared bold upper I and when it has blocks with a compound symmetry, time series, or spatial structure. PROC MIXED does not profile the log likelihood when bold upper R has unstructured blocks, when you use the HOLD= or NOITER option in the PARMS statement, or when you use the NOPROFILE option in the PROC MIXED statement.

Instead of ML or REML, you can use the noniterative MIVQUE0 method to estimate bold upper G and bold upper R (Rao 1972; LaMotte 1973; Wolfinger, Tobias, and Sall 1994). In fact, by default PROC MIXED uses MIVQUE0 estimates as starting values for the ML and REML procedures. For variance component models, another estimation method involves equating Type 1, 2, or 3 expected mean squares to their observed values and solving the resulting system. However, Swallow and Monahan (1984) present simulation evidence favoring REML and ML over MIVQUE0 and other method-of-moment estimators.

Estimating Fixed and Random Effects in the Mixed Model

ML, REML, MIVQUE0, or Type1–Type3 provide estimates of bold upper G and bold upper R, which are denoted ModifyingAbove bold upper G With caret and ModifyingAbove bold upper R With caret, respectively. To obtain estimates of bold-italic beta and bold-italic gamma, the standard method is to solve the mixed model equations (Henderson 1984):

Start 2 By 2 Matrix 1st Row 1st Column bold upper X prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper X 2nd Column bold upper X prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper Z 2nd Row 1st Column bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper X 2nd Column bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper Z plus ModifyingAbove bold upper G With caret Superscript negative 1 Baseline EndMatrix StartBinomialOrMatrix ModifyingAbove bold-italic beta With caret Choose ModifyingAbove bold-italic gamma With caret EndBinomialOrMatrix equals StartBinomialOrMatrix bold upper X prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold y Choose bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold y EndBinomialOrMatrix

The solutions can also be written as

StartLayout 1st Row 1st Column ModifyingAbove bold-italic beta With caret 2nd Column equals left-parenthesis bold upper X prime ModifyingAbove bold upper V With caret Superscript negative 1 Baseline bold upper X right-parenthesis Superscript minus Baseline bold upper X prime ModifyingAbove bold upper V With caret Superscript negative 1 Baseline bold y 2nd Row 1st Column ModifyingAbove bold-italic gamma With caret 2nd Column equals ModifyingAbove bold upper G With caret bold upper Z prime ModifyingAbove bold upper V With caret Superscript negative 1 Baseline left-parenthesis bold y minus bold upper X ModifyingAbove bold-italic beta With caret right-parenthesis EndLayout

and have connections with empirical Bayes estimators (Laird and Ware 1982; Carlin and Louis 1996).

Note that the mixed model equations are extended normal equations and that the preceding expression assumes that ModifyingAbove bold upper G With caret is nonsingular. For the extreme case where the eigenvalues of ModifyingAbove bold upper G With caret are very large, ModifyingAbove bold upper G With caret Superscript negative 1 contributes very little to the equations and ModifyingAbove bold-italic gamma With caret is close to what it would be if bold-italic gamma actually contained fixed-effects parameters. On the other hand, when the eigenvalues of ModifyingAbove bold upper G With caret are very small, ModifyingAbove bold upper G With caret Superscript negative 1 dominates the equations and ModifyingAbove bold-italic gamma With caret is close to 0. For intermediate cases, ModifyingAbove bold upper G With caret Superscript negative 1 can be viewed as shrinking the fixed-effects estimates of bold-italic gamma toward 0 (Robinson 1991).

If ModifyingAbove bold upper G With caret is singular, then the mixed model equations are modified (Henderson 1984) as follows:

Start 2 By 2 Matrix 1st Row 1st Column bold upper X prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper X 2nd Column bold upper X prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper Z ModifyingAbove bold upper G With caret 2nd Row 1st Column ModifyingAbove bold upper G With caret prime bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper X 2nd Column ModifyingAbove bold upper G With caret prime bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper Z ModifyingAbove bold upper G With caret plus bold upper G EndMatrix StartBinomialOrMatrix ModifyingAbove bold-italic beta With caret Choose ModifyingAbove bold-italic tau With caret EndBinomialOrMatrix equals StartBinomialOrMatrix bold upper X prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold y Choose ModifyingAbove bold upper G With caret prime bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold y EndBinomialOrMatrix

Denote the generalized inverses of the nonsingular ModifyingAbove bold upper G With caret and singular ModifyingAbove bold upper G With caret forms of the mixed model equations by bold upper C and bold upper M, respectively. In the nonsingular case, the solution ModifyingAbove bold-italic gamma With caret estimates the random effects directly, but in the singular case the estimates of random effects are achieved through a back-transformation ModifyingAbove bold-italic gamma With caret equals ModifyingAbove bold upper G With caret ModifyingAbove bold-italic tau With caret where ModifyingAbove bold-italic tau With caret is the solution to the modified mixed model equations. Similarly, while in the nonsingular case bold upper C itself is the estimated covariance matrix for left-parenthesis ModifyingAbove bold-italic beta With caret comma ModifyingAbove bold-italic gamma With caret right-parenthesis, in the singular case the covariance estimate for left-parenthesis ModifyingAbove bold-italic beta With caret comma ModifyingAbove bold upper G With caret ModifyingAbove bold-italic tau With caret right-parenthesis is given by bold upper P bold upper M bold upper P where

bold upper P equals Start 2 By 2 Matrix 1st Row 1st Column bold upper I 2nd Column Blank 2nd Row 1st Column Blank 2nd Column ModifyingAbove bold upper G With caret EndMatrix

An example of when the singular form of the equations is necessary is when a variance component estimate falls on the boundary constraint of 0.

Model Selection

The previous section on estimation assumes the specification of a mixed model in terms of bold upper X, bold upper Z, bold upper G, and bold upper R. Even though bold upper X and bold upper Z have known elements, their specific form and construction are flexible, and several possibilities can present themselves for a particular data set. Likewise, several different covariance structures for bold upper G and bold upper R might be reasonable.

Space does not permit a thorough discussion of model selection, but a few brief comments and references are in order. First, subject matter considerations and objectives are of great importance when selecting a model; see Diggle (1988) and Lindsey (1993).

Second, when the data themselves are looked to for guidance, many of the graphical methods and diagnostics appropriate for the general linear model extend to the mixed model setting as well (Christensen, Pearson, and Johnson 1992).

Finally, a likelihood-based approach to the mixed model provides several statistical measures for model adequacy as well. The most common of these are the likelihood ratio test and Akaike’s and Schwarz’s criteria (Bozdogan 1987; Wolfinger 1993; Keselman et al. 1998, 1999).

Statistical Properties

If bold upper G and bold upper R are known, ModifyingAbove bold-italic beta With caret is the best linear unbiased estimator (BLUE) of bold-italic beta, and ModifyingAbove bold-italic gamma With caret is the best linear unbiased predictor (BLUP) of bold-italic gamma (Searle 1971; Harville 1988, 1990; Robinson 1991; McLean, Sanders, and Stroup 1991). Here, "best" means minimum mean squared error. The covariance matrix of left-parenthesis ModifyingAbove bold-italic beta With caret minus bold-italic beta comma ModifyingAbove bold-italic gamma With caret minus bold-italic gamma right-parenthesis is

bold upper C equals Start 2 By 2 Matrix 1st Row 1st Column bold upper X prime bold upper R Superscript negative 1 Baseline bold upper X 2nd Column bold upper X prime bold upper R Superscript negative 1 Baseline bold upper Z 2nd Row 1st Column bold upper Z prime bold upper R Superscript negative 1 Baseline bold upper X 2nd Column bold upper Z prime bold upper R Superscript negative 1 Baseline bold upper Z plus bold upper G Superscript negative 1 EndMatrix Superscript minus

where Superscript minus denotes a generalized inverse (see Searle 1971).

However, bold upper G and bold upper R are usually unknown and are estimated by using one of the aforementioned methods. These estimates, ModifyingAbove bold upper G With caret and ModifyingAbove bold upper R With caret, are therefore simply substituted into the preceding expression to obtain

ModifyingAbove bold upper C With caret equals Start 2 By 2 Matrix 1st Row 1st Column bold upper X prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper X 2nd Column bold upper X prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper Z 2nd Row 1st Column bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper X 2nd Column bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper Z plus ModifyingAbove bold upper G With caret Superscript negative 1 EndMatrix Superscript minus

as the approximate variance-covariance matrix of left-parenthesis ModifyingAbove bold-italic beta With caret minus bold-italic beta comma ModifyingAbove bold-italic gamma With caret minus bold-italic gamma). In this case, the BLUE and BLUP acronyms no longer apply, but the word empirical is often added to indicate such an approximation. The appropriate acronyms thus become EBLUE and EBLUP.

McLean and Sanders (1988) show that ModifyingAbove bold upper C With caret can also be written as

ModifyingAbove bold upper C With caret equals Start 2 By 2 Matrix 1st Row 1st Column ModifyingAbove bold upper C With caret Subscript 11 Baseline 2nd Column ModifyingAbove bold upper C With caret Subscript 21 Superscript prime Baseline 2nd Row 1st Column ModifyingAbove bold upper C With caret Subscript 21 Baseline 2nd Column ModifyingAbove bold upper C With caret Subscript 22 EndMatrix

where

StartLayout 1st Row 1st Column ModifyingAbove bold upper C With caret Subscript 11 2nd Column equals left-parenthesis bold upper X prime ModifyingAbove bold upper V With caret Superscript negative 1 Baseline bold upper X right-parenthesis Superscript minus Baseline 2nd Row 1st Column ModifyingAbove bold upper C With caret Subscript 21 2nd Column equals minus ModifyingAbove bold upper G With caret bold upper Z prime ModifyingAbove bold upper V With caret Superscript negative 1 Baseline bold upper X ModifyingAbove bold upper C With caret Subscript 11 Baseline 3rd Row 1st Column ModifyingAbove bold upper C With caret Subscript 22 2nd Column equals left-parenthesis bold upper Z prime ModifyingAbove bold upper R With caret Superscript negative 1 Baseline bold upper Z plus ModifyingAbove bold upper G With caret Superscript negative 1 Baseline right-parenthesis Superscript negative 1 Baseline minus ModifyingAbove bold upper C With caret Subscript 21 Baseline bold upper X prime ModifyingAbove bold upper V With caret Superscript negative 1 Baseline bold upper Z ModifyingAbove bold upper G With caret EndLayout

Note that ModifyingAbove bold upper C With caret Subscript 11 is the familiar estimated generalized least squares formula for the variance-covariance matrix of ModifyingAbove bold-italic beta With caret.

As a cautionary note, ModifyingAbove bold upper C With caret tends to underestimate the true sampling variability of (ModifyingAbove bold-italic beta With caret ModifyingAbove bold-italic gamma With caret) because no account is made for the uncertainty in estimating bold upper G and bold upper R. Although inflation factors have been proposed (Kackar and Harville 1984; Kass and Steffey 1989; Prasad and Rao 1990), they tend to be small for data sets that are fairly well balanced. PROC MIXED does not compute any inflation factors by default, but rather accounts for the downward bias by using the approximate t and F statistics described subsequently. The DDFM=KENWARDROGER or DDFM=KENWARDROGER2 option in the MODEL statement prompts PROC MIXED to compute a specific inflation factor along with Satterthwaite-based degrees of freedom.

Inference and Test Statistics

For inferences concerning the covariance parameters in your model, you can use likelihood-based statistics. One common likelihood-based statistic is the Wald Z, which is computed as the parameter estimate divided by its asymptotic standard error. The asymptotic standard errors are computed from the inverse of the second derivative matrix of the likelihood with respect to each of the covariance parameters. The Wald Z is valid for large samples, but it can be unreliable for small data sets and for parameters such as variance components, which are known to have a skewed or bounded sampling distribution.

A better alternative is the likelihood ratio chi squared statistic. This statistic compares two covariance models, one a special case of the other. To compute it, you must run PROC MIXED twice, once for each of the two models, and then subtract the corresponding values of –2 times the log likelihoods. You can use either ML or REML to construct this statistic, which tests whether the full model is necessary beyond the reduced model.

As long as the reduced model does not occur on the boundary of the covariance parameter space, the chi squared statistic computed in this fashion has a large-sample chi squared distribution that is chi squared with degrees of freedom equal to the difference in the number of covariance parameters between the two models. If the reduced model does occur on the boundary of the covariance parameter space, the asymptotic distribution becomes a mixture of chi squared distributions (Self and Liang 1987). A common example of this is when you are testing that a variance component equals its lower boundary constraint of 0.

A final possibility for obtaining inferences concerning the covariance parameters is to simulate or resample data from your model and construct empirical sampling distributions of the parameters. The SAS macro language and the ODS system are useful tools in this regard.

F and t Tests for Fixed- and Random-Effects Parameters

For inferences concerning the fixed- and random-effects parameters in the mixed model, consider estimable linear combinations of the following form:

bold upper L StartBinomialOrMatrix bold-italic beta Choose bold-italic gamma EndBinomialOrMatrix

The estimability requirement (Searle 1971) applies only to the bold-italic beta portion of bold upper L, because any linear combination of bold-italic gamma is estimable. Such a formulation in terms of a general bold upper L matrix encompasses a wide variety of common inferential procedures such as those employed with Type 1–Type 3 tests and LS-means. The CONTRAST and ESTIMATE statements in PROC MIXED enable you to specify your own bold upper L matrices. Typically, inference on fixed effects is the focus, and, in this case, the bold-italic gamma portion of bold upper L is assumed to contain all 0s.

Statistical inferences are obtained by testing the hypothesis

upper H colon bold upper L StartBinomialOrMatrix bold-italic beta Choose bold-italic gamma EndBinomialOrMatrix equals 0

or by constructing point and interval estimates.

When bold upper L consists of a single row, a general t statistic can be constructed as follows (see McLean and Sanders 1988; Stroup 1989a):

t equals StartFraction bold upper L StartBinomialOrMatrix ModifyingAbove bold-italic beta With caret Choose ModifyingAbove bold-italic gamma With caret EndBinomialOrMatrix Over StartRoot bold upper L ModifyingAbove bold upper C With caret bold upper L prime EndRoot EndFraction

Under the assumed normality of bold-italic gamma and bold-italic epsilon, t has an exact t distribution only for data exhibiting certain types of balance and for some special unbalanced cases. In general, t is only approximately t-distributed, and its degrees of freedom must be estimated. See the DDFM= option for a description of the various degrees-of-freedom methods available in PROC MIXED.

With ModifyingAbove nu With caret being the approximate degrees of freedom, the associated confidence interval is

bold upper L StartBinomialOrMatrix ModifyingAbove bold-italic beta With caret Choose ModifyingAbove bold-italic gamma With caret EndBinomialOrMatrix plus-or-minus t Subscript ModifyingAbove nu With caret comma alpha slash 2 Baseline StartRoot bold upper L ModifyingAbove bold upper C With caret bold upper L prime EndRoot

where t Subscript ModifyingAbove nu With caret comma alpha slash 2 is the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the t Subscript ModifyingAbove nu With caret distribution.

When the rank of bold upper L is greater than 1, PROC MIXED constructs the following general F statistic:

upper F equals StartFraction StartBinomialOrMatrix ModifyingAbove bold-italic beta With caret Choose ModifyingAbove bold-italic gamma With caret EndBinomialOrMatrix prime bold upper L prime left-parenthesis bold upper L ModifyingAbove bold upper C With caret bold upper L prime right-parenthesis Superscript negative 1 Baseline bold upper L StartBinomialOrMatrix ModifyingAbove bold-italic beta With caret Choose ModifyingAbove bold-italic gamma With caret EndBinomialOrMatrix Over r EndFraction

where r equals normal r normal a normal n normal k left-parenthesis bold upper L ModifyingAbove bold upper C With caret bold upper L prime right-parenthesis. Analogous to t, F in general has an approximate F distribution with r numerator degrees of freedom and ModifyingAbove nu With caret denominator degrees of freedom.

The t and F statistics enable you to make inferences about your fixed effects, which account for the variance-covariance model you select. An alternative is the chi squared statistic associated with the likelihood ratio test. This statistic compares two fixed-effects models, one a special case of the other. It is computed just as when comparing different covariance models, although you should use ML and not REML here because the penalty term associated with restricted likelihoods depends upon the fixed-effects specification.

F Tests With the ANOVAF Option

The ANOVAF option computes F tests by the following method in models with REPEATED statement and without RANDOM statement. Let bold upper L denote the matrix of estimable functions for the hypothesis upper H colon bold upper L bold-italic beta equals bold 0, where bold-italic beta are the fixed-effects parameters. Let bold upper M equals bold upper L prime left-parenthesis bold upper L bold upper L prime right-parenthesis Superscript minus Baseline bold upper L, and suppose that ModifyingAbove bold upper C With caret denotes the estimated variance-covariance matrix of ModifyingAbove bold-italic beta With caret (see the section Statistical Properties for the construction of ModifyingAbove bold upper C With caret).

The ANOVAF F statistics are computed as

upper F Subscript upper A Baseline equals ModifyingAbove bold-italic beta With caret prime bold upper L prime left-parenthesis bold upper L bold upper L prime right-parenthesis Superscript negative 1 Baseline bold upper L ModifyingAbove bold-italic beta With caret slash t 1 equals ModifyingAbove bold-italic beta With caret prime bold upper M ModifyingAbove bold-italic beta With caret slash t 1

Notice that this is a modification of the usual F statistic where left-parenthesis bold upper L ModifyingAbove bold upper C With caret bold upper L prime right-parenthesis Superscript negative 1 is replaced with left-parenthesis bold upper L bold upper L prime right-parenthesis Superscript negative 1 and normal r normal a normal n normal k left-parenthesis bold upper L right-parenthesis is replaced with t 1 equals normal t normal r normal a normal c normal e left-parenthesis bold upper M ModifyingAbove bold upper C With caret right-parenthesis; see, for example, Brunner, Domhof, and Langer (2002, Sec. 5.4). The p-values for this statistic are computed from either an upper F Subscript nu 1 comma nu 2 or an upper F Subscript nu 1 comma normal infinity distribution. The respective degrees of freedom are determined by the MIXED procedure as follows:

StartLayout 1st Row 1st Column nu 1 2nd Column equals StartFraction t 1 squared Over normal t normal r normal a normal c normal e left-parenthesis bold upper M ModifyingAbove bold upper C With caret bold upper M ModifyingAbove bold upper C With caret right-parenthesis EndFraction 2nd Row 1st Column nu 2 Superscript asterisk 2nd Column equals StartFraction 2 t 1 squared Over bold g prime bold upper A bold g EndFraction 3rd Row 1st Column nu 2 2nd Column equals StartLayout Enlarged left-brace 1st Row 1st Column max left-brace min left-brace nu 2 Superscript asterisk Baseline comma d f Subscript e Baseline right-brace comma 1 right-brace 2nd Column bold g prime bold upper A bold g greater-than 1 normal upper E Baseline 3 times normal upper M normal upper A normal upper C normal upper E normal upper P normal upper S 2nd Row 1st Column 1 2nd Column normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout EndLayout

The term bold g prime bold upper A bold g in the term nu 2 Superscript asterisk for the denominator degrees of freedom is based on approximating normal upper V normal a normal r left-bracket normal t normal r normal a normal c normal e left-parenthesis bold upper M ModifyingAbove bold upper C With caret right-parenthesis right-bracket based on a first-order Taylor series about the true covariance parameters. This generalizes results in the appendix of Brunner, Dette, and Munk (1997) to a broader class of models. The vector bold g equals left-bracket g 1 comma ellipsis comma g Subscript q Baseline right-bracket contains the partial derivatives

normal t normal r normal a normal c normal e left-parenthesis bold upper L prime left-parenthesis bold upper L bold upper L prime right-parenthesis Superscript negative 1 Baseline bold upper L StartFraction partial-differential ModifyingAbove bold upper C With caret Over partial-differential theta Subscript i Baseline EndFraction right-parenthesis

and bold upper A is the asymptotic variance-covariance matrix of the covariance parameter estimates (ASYCOV option).

PROC MIXED reports nu 1 and nu 2 as "NumDF" and "DenDF" under the "ANOVA F" heading in the output. The corresponding p-values are denoted as "Pr > F(DDF)" for upper F Subscript nu 1 comma nu 2 and "Pr > F(infty)" for upper F Subscript nu 1 comma normal infinity, respectively.

P-values computed with the ANOVAF option can be identical to the nonparametric tests in Akritas, Arnold, and Brunner (1997) and in Brunner, Domhof, and Langer (2002), provided that the response data consist of properly created (and sorted) ranks and that the covariance parameters are estimated by MIVQUE0 in models with REPEATED statement and properly chosen SUBJECT= and/or GROUP= effects.

If you model an unstructured covariance matrix in a longitudinal model with one or more repeated factors, the ANOVAF results are identical to a multivariate MANOVA where degrees of freedom are corrected with the Greenhouse-Geisser adjustment (Greenhouse and Geisser 1959). For example, suppose that factor A has 2 levels and factor B has 4 levels. The following two sets of statements produce the same p-values:

proc mixed data=Mydata anovaf method=mivque0;
   class id A B;
   model score = A | B / chisq;
   repeated / type=un subject=id;
   ods select Tests3;
run;
proc transpose data=MyData out=tdata;
   by id;
   var score;
run;
proc glm data=tdata;
   model col: = / nouni;
   repeated A 2, B 4;
   ods output ModelANOVA=maov epsilons=eps;
run;
proc transpose data=eps(where=(substr(statistic,1,3)='Gre')) out=teps;
   var cvalue1;
run;

data aov; set maov;
   if (_n_ = 1) then merge teps;
   if (Source='A') then do;
      pFddf = ProbF;
      pFinf = 1 - probchi(df*Fvalue,df);
      output;
   end; else if (Source='B') then do;
      pFddf = ProbFGG;
      pFinf = 1 - probchi(df*col1*Fvalue,df*col1);
      output;
   end; else if (Source='A*B') then do;
      pfddF = ProbFGG;
      pFinf = 1 - probchi(df*col2*Fvalue,df*col2);
      output;
   end;
run;
proc print data=aov label noobs;
   label Source = 'Effect'
         df     = 'NumDF'
         Fvalue = 'Value'
         pFddf  = 'Pr > F(DDF)'
         pFinf  = 'Pr > F(infty)';
   var Source df Fvalue pFddf pFinf;
   format pF: pvalue6.;
run;

The PROC GLM code produces p-values that correspond to the ANOVAF p-values shown as Pr > F(DDF) in the MIXED output. The subsequent DATA step computes the p-values that correspond to Pr > F(infty) in the PROC MIXED output.

Last updated: December 09, 2022