The GLIMMIX Procedure

Maximum Likelihood Estimation Based on Adaptive Quadrature

Quadrature methods, like the Laplace approximation, approximate integrals. If you choose METHOD=QUAD for a generalized linear mixed model, the GLIMMIX procedure approximates the marginal log likelihood with an adaptive Gauss-Hermite quadrature rule. Gaussian quadrature is particularly well suited to numerically evaluate integrals against probability measures (Lange 1999, Ch. 16). And Gauss-Hermite quadrature is appropriate when the density has kernel exp left-brace minus x squared right-brace and integration extends over the real line, as is the case for the normal distribution. Suppose that p left-parenthesis x right-parenthesis is a probability density function and the function f left-parenthesis x right-parenthesis is to be integrated against it. Then the quadrature rule is

integral Subscript negative normal infinity Superscript normal infinity Baseline f left-parenthesis x right-parenthesis p left-parenthesis x right-parenthesis d x almost-equals sigma-summation Underscript i equals 1 Overscript upper N Endscripts w Subscript i Baseline f left-parenthesis x Subscript i Baseline right-parenthesis

where N denotes the number of quadrature points, the w Subscript i are the quadrature weights, and the x Subscript i are the abscissas. The Gaussian quadrature chooses abscissas in areas of high density, and if p left-parenthesis x right-parenthesis is continuous, the quadrature rule is exact if f left-parenthesis x right-parenthesis is a polynomial of up to degree 2N – 1. In the generalized linear mixed model the roles of f left-parenthesis x right-parenthesis and p left-parenthesis x right-parenthesis are played by the conditional distribution of the data given the random effects, and the random-effects distribution, respectively. Quadrature abscissas and weights are those of the standard Gauss-Hermite quadrature (Golub and Welsch 1969; see also Table 25.10 of Abramowitz and Stegun 1972; Evans 1993).

A numerical integration rule is called adaptive when it uses a variable step size to control the error of the approximation. For example, an adaptive trapezoidal rule uses serial splitting of intervals at midpoints until a desired tolerance is achieved. The quadrature rule in the GLIMMIX procedure is adaptive in the following sense: if you do not specify the number of quadrature points (nodes) with the QPOINTS= suboption of the METHOD=QUAD option, then the number of quadrature points is determined by evaluating the log likelihood at the starting values at a successively larger number of nodes until a tolerance is met (for more details see the text under the heading "Starting Values" in the next section). Furthermore, the GLIMMIX procedure centers and scales the quadrature points by using the empirical Bayes estimates (EBEs) of the random effects and the Hessian (second derivative) matrix from the EBE suboptimization. This centering and scaling improves the likelihood approximation by placing the abscissas according to the density function of the random effects. It is not, however, adaptiveness in the previously stated sense.

Objective Function

Let bold-italic beta denote the vector of fixed-effects parameters and bold-italic theta the vector of covariance parameters. For quadrature estimation in the GLIMMIX procedure, bold-italic theta includes the G-side parameters and a possible scale parameter phi, provided that the conditional distribution of the data contains such a scale parameter. bold-italic theta Superscript asterisk is the vector of the G-side parameters. The marginal distribution of the data for subject i in a mixed model can be expressed as

p left-parenthesis bold y Subscript i Baseline right-parenthesis equals integral midline-horizontal-ellipsis integral p left-parenthesis bold y Subscript i Baseline vertical-bar bold-italic gamma Subscript i Baseline comma bold-italic beta comma phi right-parenthesis p left-parenthesis bold-italic gamma Subscript i Baseline vertical-bar bold-italic theta Superscript asterisk Baseline right-parenthesis d bold-italic gamma Subscript i Baseline

Suppose upper N Subscript q denotes the number of quadrature points in each dimension (for each random effect) and r denotes the number of random effects. For each subject, obtain the empirical Bayes estimates of bold-italic gamma Subscript i as the vector ModifyingAbove bold-italic gamma With caret Subscript i that minimizes

minus log left-brace p left-parenthesis bold y Subscript i Baseline vertical-bar bold-italic gamma Subscript i Baseline comma bold-italic beta comma phi right-parenthesis p left-parenthesis bold-italic gamma Subscript i Baseline vertical-bar bold-italic theta Superscript asterisk Baseline right-parenthesis right-brace equals f left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma Subscript i Baseline right-parenthesis

If bold z equals left-bracket z 1 comma ellipsis comma z Subscript upper N Sub Subscript q Subscript Baseline right-bracket are the standard abscissas for Gauss-Hermite quadrature, and bold z Subscript j Superscript asterisk Baseline equals left-bracket z Subscript j 1 Baseline comma ellipsis comma z Subscript j Sub Subscript r Subscript Baseline right-bracket is a point on the r-dimensional quadrature grid, then the centered and scaled abscissas are

bold a Subscript j Superscript asterisk Baseline equals ModifyingAbove bold-italic gamma With caret Subscript i Baseline plus 2 Superscript 1 slash 2 Baseline f double-prime left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret Subscript i Baseline right-parenthesis Superscript negative 1 slash 2 Baseline bold z Subscript j Superscript asterisk

As for the Laplace approximation, f double-prime is the second derivative matrix with respect to the random effects,

f double-prime left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret Subscript i Baseline right-parenthesis equals StartFraction partial-differential squared f left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma Subscript i Baseline right-parenthesis Over partial-differential bold-italic gamma Subscript i Baseline partial-differential bold-italic gamma prime Subscript i EndFraction vertical-bar Subscript ModifyingAbove bold-italic gamma With caret Sub Subscript i Subscript Baseline

These centered and scaled abscissas, along with the Gauss-Hermite quadrature weights bold w equals left-bracket w 1 comma ellipsis comma w Subscript upper N Sub Subscript q Subscript Baseline right-bracket, are used to construct the r-dimensional integral by a sequence of one-dimensional rules

StartLayout 1st Row 1st Column p left-parenthesis bold y Subscript i Baseline right-parenthesis 2nd Column equals integral midline-horizontal-ellipsis integral p left-parenthesis bold y Subscript i Baseline vertical-bar bold-italic gamma Subscript i Baseline comma bold-italic beta comma phi right-parenthesis p left-parenthesis bold-italic gamma Subscript i Baseline vertical-bar bold-italic theta Superscript asterisk Baseline right-parenthesis d bold-italic gamma Subscript i Baseline 2nd Row 1st Column Blank 2nd Column almost-equals 2 Superscript r slash 2 Baseline StartAbsoluteValue f double-prime left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret Subscript i Baseline right-parenthesis EndAbsoluteValue Superscript negative 1 slash 2 Baseline 3rd Row 1st Column Blank 2nd Column sigma-summation Underscript j 1 equals 1 Overscript upper N Subscript q Baseline Endscripts midline-horizontal-ellipsis sigma-summation Underscript j Subscript r Baseline equals 1 Overscript upper N Subscript q Baseline Endscripts left-bracket p left-parenthesis bold y Subscript i Baseline vertical-bar bold a Subscript j Superscript asterisk Baseline comma bold-italic beta comma phi right-parenthesis p left-parenthesis bold a Subscript j Superscript asterisk Baseline vertical-bar bold-italic theta Superscript asterisk Baseline right-parenthesis product Underscript k equals 1 Overscript r Endscripts w Subscript j Sub Subscript k Subscript Baseline exp z Subscript j Sub Subscript k Subscript Superscript 2 Baseline right-bracket EndLayout

The right-hand side of this expression, properly accumulated across subjects, is the objective function for adaptive quadrature estimation in the GLIMMIX procedure. The preceding expression constitutes a one-level adaptive Gaussian quadrature approximation.

As the number of random effects grows, the dimension of the integral increases accordingly. This increase can happen especially when you have nested random effects. In this case, the one-level quadrature approximation described earlier quickly becomes computationally infeasible. The following scenarios illustrate the relationship among the computational effort, the dimension of the random effects, and the number of quadrature nodes. Suppose that the A effect has four levels, and consider the following statements:

proc glimmix method=quad(qpoints=5);
   class A id;
   model y = / dist=negbin;
   random A / subject=id;
run;

For each subject, computing the marginal log likelihood requires the numerical evaluation of a four-dimensional integral. With the number of quadrature points set to five by the QPOINTS=5 option, this means that each marginal log-likelihood evaluation requires 5 Superscript 4 Baseline equals 625 conditional log likelihoods to be computed for each observation on each pass through the data. As the number of quadrature points or the number of random effects increases, this constitutes a sizable computational effort. Suppose, for example, that just one additional random effect, B, with two levels is added as an interaction, as in the following statements:

proc glimmix method=quad(qpoints=5);
   class A B id;
   model y = / dist=negbin;
   random A A*B / subject=id;
run;

Now a single marginal likelihood calculation requires 5 Superscript left-parenthesis 4 plus 8 right-parenthesis = 244,140,625 conditional log likelihoods for each observation on each pass through the data.

You can reduce the dimension of the random effects in the preceding PROC GLIMMIX code by factoring A out of the two random effects in the RANDOM statement, as shown in the following statements:

proc glimmix method=quad(qpoints=5);
   class A B id;
   model y = / dist=negbin;
   random int B / subject=id*A;
run;

With the random effects int and B, the preceding PROC GLIMMIX code requires the evaluation of 5 Superscript left-parenthesis 1 plus 2 right-parenthesis Baseline equals 125 conditional log likelihoods for each observation on each pass through the data.

This idea of reducing the dimension of random effects is the key to the multilevel adaptive Gaussian quadrature algorithm described in Pinheiro and Chao (2006). By exploiting the sparseness in the random-effects design matrix Z, the multilevel quadrature algorithm reduces the dimension of the random effects to the sum of the dimensions of random effects from each level. You can use the FASTQUAD suboption in the METHOD=QUAD option to prompt PROC GLIMMIX to compute this multilevel quadrature approximation.

To see the effect of the FASTQUAD option, consider the following model for the preceding example:

proc glimmix method=quad(qpoints=5);
   class A B id;
   model y = / dist=negbin;
   random A A*B B/ subject=id;
run;

In this case, it is not possible to factor a single SUBJECT= variable out of all the random effects. Formulated in this one-level way, a single evaluation of the marginal likelihood requires the computing of 5 Superscript left-parenthesis 4 plus 8 plus 2 right-parenthesis = 488,281,250 conditional log likelihoods for each observation on each pass through the data.

Alternatively, to take advantage of the multilevel quadrature approximation, you need to use the FASTQUAD option and explicitly specify the two-level structure by including one RANDOM statement for each level:

proc glimmix method=quad(qpoints=5 fastquad);
   class A B id;
   model y = / dist=negbin;
   random B     / subject=id;
   random int B / subject=id*A;
run;

The first RANDOM statement specifies the random effect B for the level that corresponds to id; the second RANDOM statement specifies the random effects int and B for the level that corresponds to id*A. With this specification, the multilevel quadrature approximation computes only 5 Superscript left-parenthesis 2 plus 1 plus 2 right-parenthesis = 3,125 conditional log likelihoods for each observation on each pass through the data, where the exponent left-parenthesis 2 plus 1 plus 2 right-parenthesis is the sum of the number of random effects in the two RANDOM statements.

In general, consider a two-level model in which m level-2 units are nested within each level-1 unit. In this case, the one-level upper N Subscript q point adaptive quadrature approximation to a marginal likelihood that is an integral over r 1 level-1 random effects and r 2 level-2 random effects requires upper N Subscript q Superscript r 1 times m plus r 2 evaluations of the conditional log likelihoods for each observation. However, the two-level adaptive quadrature approximation requires only upper N Subscript q Superscript r 1 plus r 2 evaluations of the conditional log likelihoods. By increasing exponentially with r 1 instead of with r 1 times m, the multilevel quadrature algorithm significantly reduces the computational and memory requirements.

Quadrature or Laplace Approximation

If you select the quadrature rule with a single quadrature point, namely

proc glimmix method=quad(qpoints=1);

the results will be identical to METHOD=LAPLACE. Computationally, the two methods are not identical, however. METHOD=LAPLACE can be applied to a considerably larger class of models. For example, crossed random effects, models without subjects, or models with non-nested subjects can be handled with the Laplace approximation but not with quadrature. Furthermore, METHOD=LAPLACE draws on a number of computational simplifications that can increase its efficiency compared to a quadrature algorithm with a single node. For example, the Laplace approximation is possible with unbounded covariance parameter estimates (NOBOUND option in the PROC GLIMMIX statement) and can permit certain types of negative definite or indefinite bold upper G matrices. The adaptive quadrature approximation with scaled abscissas typically breaks down when bold upper G is not at least positive semidefinite.

In the multilevel quadrature algorithm, the total number of random effects is the sum of the number of random effects at each level. Still, as the number of random effects grows at any of the levels, quadrature approximation becomes computationally infeasible. Laplace approximation presents a computationally more expedient alternative.

If you wonder whether METHOD=LAPLACE would present a viable alternative to a model that you can fit with METHOD=QUAD, the "Optimization Information" table can provide some insights. The table contains as its last entry the number of quadrature points determined by PROC GLIMMIX to yield a sufficiently accurate approximation of the log likelihood (at the starting values). In many cases, a single quadrature node is sufficient, in which case the estimates are identical to those of METHOD=LAPLACE.

Last updated: December 09, 2022