The GLIMMIX Procedure

Maximum Likelihood Estimation Based on Laplace Approximation

Objective Function

Let bold-italic beta denote the vector of fixed-effects parameters and bold-italic theta the vector of covariance parameters. For Laplace estimation in the GLIMMIX procedure, bold-italic theta includes the G-side parameters and a possible scale parameter phi, provided that the conditional distribution of the data contains such a scale parameter. bold-italic theta Superscript asterisk is the vector of the G-side parameters.

The marginal distribution of the data in a mixed model can be expressed as

StartLayout 1st Row 1st Column p left-parenthesis bold y right-parenthesis 2nd Column equals integral p left-parenthesis bold y vertical-bar bold-italic gamma comma bold-italic beta comma phi right-parenthesis p left-parenthesis bold-italic gamma vertical-bar bold-italic theta Superscript asterisk Baseline right-parenthesis d bold-italic gamma 2nd Row 1st Column Blank 2nd Column equals integral exp left-brace log left-brace p left-parenthesis bold y vertical-bar bold-italic gamma comma bold-italic beta comma phi right-parenthesis right-brace plus log left-brace p left-parenthesis bold-italic gamma vertical-bar bold-italic theta Superscript asterisk Baseline right-parenthesis right-brace right-brace d bold-italic gamma 3rd Row 1st Column Blank 2nd Column equals integral exp left-brace c Subscript l Baseline f left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma right-parenthesis right-brace d bold-italic gamma EndLayout

If the constant c Subscript l is large, the Laplace approximation of this integral is

upper L left-parenthesis bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret comma bold y right-parenthesis equals left-parenthesis StartFraction 2 pi Over c Subscript l Baseline EndFraction right-parenthesis Superscript n Super Subscript gamma Superscript slash 2 Baseline StartAbsoluteValue minus f double-prime left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret right-parenthesis EndAbsoluteValue Superscript negative 1 slash 2 Baseline e Superscript c Super Subscript l Superscript f left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret right-parenthesis

where n Subscript gamma is the number of elements in bold-italic gamma, f double-prime is the second derivative matrix

f double-prime left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret right-parenthesis equals StartFraction partial-differential squared f left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma right-parenthesis Over partial-differential bold-italic gamma partial-differential bold-italic gamma Superscript prime Baseline EndFraction vertical-bar Subscript ModifyingAbove bold-italic gamma With caret Baseline

and ModifyingAbove bold-italic gamma With caret satisfies the first-order condition

StartFraction partial-differential f left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma right-parenthesis Over partial-differential bold-italic gamma EndFraction equals bold 0

The objective function for Laplace parameter estimation in the GLIMMIX procedure is minus 2 log left-brace upper L left-parenthesis bold-italic beta comma bold-italic theta ModifyingAbove bold-italic gamma With caret comma bold y right-parenthesis right-brace. The optimization process is singly iterative, but because ModifyingAbove bold-italic gamma With caret depends on ModifyingAbove bold-italic beta With caret and ModifyingAbove bold-italic theta With caret, the GLIMMIX procedure solves a suboptimization problem to determine for given values of ModifyingAbove bold-italic beta With caret and ModifyingAbove bold-italic theta With caret the random-effects solution vector that maximizes f left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma right-parenthesis.

When you have longitudinal or clustered data with m independent subjects or clusters, the vector of observations can be written as bold y equals left-bracket bold y prime 1 comma ellipsis comma bold y prime Subscript m right-bracket prime, where bold y Subscript i is an n Subscript i Baseline times 1 vector of observations for subject (cluster) i (i equals 1 comma ellipsis comma m). In this case, assuming conditional independence such that

p left-parenthesis bold y Subscript i Baseline vertical-bar bold-italic gamma Subscript i Baseline right-parenthesis equals product Underscript j equals 1 Overscript n Subscript i Baseline Endscripts p left-parenthesis y Subscript i j Baseline vertical-bar bold-italic gamma Subscript i Baseline right-parenthesis

the marginal distribution of the data can be expressed as

StartLayout 1st Row 1st Column p left-parenthesis bold y right-parenthesis equals product Underscript i equals 1 Overscript m Endscripts p left-parenthesis bold y Subscript i Baseline right-parenthesis 2nd Column equals product Underscript i equals 1 Overscript m Endscripts integral p left-parenthesis bold y Subscript i Baseline vertical-bar bold-italic gamma Subscript i Baseline right-parenthesis p left-parenthesis bold-italic gamma Subscript i Baseline right-parenthesis d bold-italic gamma Subscript i Baseline 2nd Row 1st Column Blank 2nd Column equals product Underscript i equals 1 Overscript m Endscripts integral exp left-brace n Subscript i Baseline f left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma Subscript i Baseline right-parenthesis right-brace d bold-italic gamma Subscript i Baseline EndLayout

where

StartLayout 1st Row 1st Column n Subscript i Baseline f left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma Subscript i Baseline right-parenthesis 2nd Column equals log left-brace p left-parenthesis bold y Subscript i Baseline vertical-bar bold-italic gamma Subscript i Baseline right-parenthesis p left-parenthesis bold-italic gamma Subscript i Baseline right-parenthesis right-brace 2nd Row 1st Column Blank 2nd Column equals sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts log left-brace p left-parenthesis y Subscript i j Baseline vertical-bar bold-italic gamma Subscript i Baseline right-parenthesis right-brace plus log left-brace p left-parenthesis bold-italic gamma Subscript i Baseline right-parenthesis right-brace EndLayout

When the number of observations within a cluster, n Subscript i, is large, the Laplace approximation to the ith individual’s marginal probability density function is

StartLayout 1st Row 1st Column p left-parenthesis bold y Subscript i Baseline vertical-bar bold-italic beta comma bold-italic theta right-parenthesis 2nd Column equals integral exp left-brace n Subscript i Baseline f left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon bold-italic gamma Subscript i Baseline right-parenthesis right-brace d bold-italic gamma Subscript i Baseline 2nd Row 1st Column Blank 2nd Column equals StartFraction left-parenthesis 2 pi right-parenthesis Superscript n Super Subscript gamma Superscript Baseline slash 2 Over StartAbsoluteValue minus n Subscript i Baseline f double-prime left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret Subscript i Baseline right-parenthesis EndAbsoluteValue Superscript 1 slash 2 Baseline EndFraction exp left-brace n Subscript i Baseline f left-parenthesis bold y Subscript i Baseline comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret Subscript i Baseline right-parenthesis right-brace EndLayout

where n Subscript gamma i is the common dimension of the random effects, bold-italic gamma Subscript i. In this case, provided that the constant c Subscript l Baseline equals min left-brace n Subscript i Baseline right-brace is large, the Laplace approximation to the marginal log likelihood is

StartLayout 1st Row 1st Column log left-brace upper L left-parenthesis bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret comma bold y right-parenthesis right-brace 2nd Column equals sigma-summation Underscript i equals 1 Overscript m Endscripts left-brace n Subscript i Baseline f left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret Subscript i Baseline right-parenthesis plus StartFraction n Subscript gamma i Baseline Over 2 EndFraction log left-brace 2 pi right-brace 2nd Row 1st Column Blank 2nd Column minus one-half log StartAbsoluteValue minus n Subscript i Baseline f double-prime left-parenthesis bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret Subscript i Baseline right-parenthesis EndAbsoluteValue right-brace EndLayout

which serves as the objective function for the METHOD=LAPLACE estimator in PROC GLIMMIX.

The Laplace approximation implemented in the GLIMMIX procedure differs from that in Wolfinger (1993) and Pinheiro and Bates (1995) in important respects. Wolfinger (1993) assumed a flat prior for bold-italic beta and expanded the integrand around bold-italic beta and bold-italic gamma, leaving only the covariance parameters for the overall optimization. The "fixed" effects bold-italic beta and the random effects bold-italic gamma are determined in a suboptimization that takes the form of a linear mixed model step with pseudo-data. The GLIMMIX procedure involves only the random effects vector bold-italic gamma in the suboptimization. Pinheiro and Bates (1995) and Wolfinger (1993) consider a modified Laplace approximation that replaces the second derivative f double-prime left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret right-parenthesis with an (approximate) expected value, akin to scoring. The GLIMMIX procedure does not use an approximation to f double-prime left-parenthesis bold y comma bold-italic beta comma bold-italic theta semicolon ModifyingAbove bold-italic gamma With caret right-parenthesis. The METHOD=RSPL estimates in PROC GLIMMIX are equivalent to the estimates obtained with the modified Laplace approximation in Wolfinger (1993). The objective functions of METHOD=RSPL and Wolfinger (1993) differ in a constant that depends on the number of parameters.

Asymptotic Properties and the Importance of Subjects

Suppose that the GLIMMIX procedure processes your data by subjects (see the section Processing by Subjects) and let n Subscript i denote the number of observations per subject, i equals 1 comma ellipsis comma s. Arguments in Vonesh (1996) show that the maximum likelihood estimator based on the Laplace approximation is a consistent estimator to order upper O Subscript p Baseline left-brace max left-brace 1 slash StartRoot s EndRoot right-brace comma 1 slash min left-brace n Subscript i Baseline right-brace right-brace. In other words, as the number of subjects and the number of observations per subject grows, the small-sample bias of the Laplace estimator disappears. Note that the term involving the number of subjects in this maximum relates to standard asymptotic theory, and the term involving the number of observations per subject relates to the accuracy of the Laplace approximation (Vonesh 1996). In the case where random effects enter the model linearly, the Laplace approximation is exact and the requirement that min left-brace n Subscript i Baseline right-brace right-arrow normal infinity can be dropped.

If your model is not processed by subjects but is equivalent to a subject model, the asymptotics with respect to s still apply, because the Hessian matrix of the suboptimization for bold-italic gamma breaks into s separate blocks. For example, the following two models are equivalent with respect to s and n Subscript i, although only for the first model does PROC GLIMMIX process the data explicitly by subjects:

proc glimmix method=laplace;
   class sub A;
   model y = A;
   random intercept / subject=sub;
run;
proc glimmix method=laplace;
   class sub A;
   model y = A;
   random sub;
run;

The same holds, for example, for models with independent nested random effects. The following two models are equivalent, and you can derive asymptotic properties related to s and min left-brace n Subscript i Baseline right-brace from the model in the first run:

proc glimmix method=laplace;
   class A B block;
   model y = A B A*B;
   random intercept A / subject=block;
run;

proc glimmix method=laplace;
   class A B block;
   model y = A B A*B;
   random block a*block;
run;

The Laplace approximation requires that the dimension of the integral does not increase with the size of the sample. Otherwise the error of the likelihood approximation does not diminish with n Subscript i. This is the case, for example, with exchangeable arrays (Shun and McCullagh 1995), crossed random effects (Shun 1997), and correlated random effects of arbitrary dimension (Raudenbush, Yang, and Yosef 2000). Results in Shun (1997), for example, show that even in this case the standard Laplace approximation has smaller bias than pseudo-likelihood estimates.

Last updated: December 09, 2022