Introduction to Mixed Modeling Procedures

Generalized Linear Mixed Model

In a generalized linear mixed model (GLMM) the G-side random effects are part of the linear predictor, bold-italic eta equals bold upper X bold-italic beta plus bold upper Z bold-italic gamma, and the predictor is related nonlinearly to the conditional mean of the data

normal upper E left-bracket bold upper Y vertical-bar bold-italic gamma right-bracket equals g Superscript negative 1 Baseline left-parenthesis bold-italic eta right-parenthesis equals g Superscript negative 1 Baseline left-parenthesis bold upper X bold-italic beta plus bold upper Z bold-italic gamma right-parenthesis

where g Superscript negative 1 Baseline left-parenthesis dot right-parenthesis is the inverse link function. The conditional distribution of the data, given the random effects, is a member of the exponential family of distributions, such as the binary, binomial, Poisson, gamma, beta, or chi-square distribution. Because the normal distribution is also a member of the exponential family, the class of the linear mixed models is a subset of the generalized linear mixed models. In order to completely specify a GLMM, you need to do the following:

  1. Formulate the linear predictor, including fixed and random effects.

  2. Choose a link function.

  3. Choose the distribution of the response, conditional on the random effects, from the exponential family.

As an example, suppose that s pairs of twins are randomly selected in a matched-pair design. One of the twins in each pair receives a treatment and the outcome variable is some binary measure. This is a study with s clusters (subjects) and each cluster is of size 2. If upper Y Subscript i j denotes the binary response of twin j equals 1 comma 2 in cluster i, then a linear predictor for this experiment could be

eta Subscript i j Baseline equals beta 0 plus tau x Subscript i j Baseline plus gamma Subscript i

where x Subscript i j denotes a regressor variable that takes on the value 1 for the treated observation in each pair, and 0 otherwise. The gamma Subscript i are pair-specific random effects that model heterogeneity across sets of twins and that induce a correlation between the members of each pair. By virtue of random sampling the sets of twins, it is reasonable to assume that the gamma Subscript i are independent and have equal variance. This leads to a diagonal bold upper G matrix,

normal upper V normal a normal r left-bracket bold-italic gamma right-bracket equals normal upper V normal a normal r Start 5 By 1 Matrix 1st Row  gamma 1 2nd Row  gamma 2 3rd Row  gamma 3 4th Row  vertical-ellipsis 5th Row  gamma Subscript s Baseline EndMatrix equals Start 5 By 5 Matrix 1st Row 1st Column sigma Subscript gamma Superscript 2 Baseline 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 2nd Row 1st Column 0 2nd Column sigma Subscript gamma Superscript 2 Baseline 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column sigma Subscript gamma Superscript 2 Baseline 4th Column midline-horizontal-ellipsis 5th Column 0 4th Row 1st Column vertical-ellipsis 2nd Column vertical-ellipsis 3rd Column vertical-ellipsis 4th Column down-right-diagonal-ellipsis 5th Column vertical-ellipsis 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column midline-horizontal-ellipsis 5th Column sigma Subscript gamma Superscript 2 EndMatrix

A common link function for binary data is the logit link, which leads in the second step of model formulation to

StartLayout 1st Row 1st Column normal upper E left-bracket upper Y Subscript i j Baseline vertical-bar gamma Subscript i Baseline right-bracket equals mu Subscript i j Baseline vertical-bar gamma Subscript i Baseline equals 2nd Column StartFraction 1 Over 1 plus exp left-brace minus eta Subscript i j Baseline right-brace EndFraction 2nd Row 1st Column normal l normal o normal g normal i normal t StartSet StartFraction mu Subscript i j Baseline vertical-bar gamma Subscript i Baseline Over 1 minus mu Subscript i j Baseline vertical-bar gamma Subscript i Baseline EndFraction EndSet equals 2nd Column eta Subscript i j EndLayout

The final step, choosing a distribution from the exponential family, is automatic in this example; only the binary distribution comes into play to model the distribution of upper Y Subscript i j Baseline vertical-bar gamma Subscript i Baseline.

As for the linear mixed model, there is a marginal model in the case of a generalized linear mixed model that results from integrating the joint distribution over the random effects. This marginal distribution is elusive for many GLMMs, and parameter estimation proceeds by either approximating the model or by approximating the marginal integral. Details of these approaches are described in the section Generalized Linear Mixed Models Theory, in ChapterĀ 52, The GLIMMIX Procedure.

A marginal model, one that models correlation through the bold upper R matrix and does not involve G-side random effects, can also be formulated in the GLMM family; such models are the extension of the correlated-error models in the linear mixed model family. Because nonnormal distributions in the exponential family exhibit a functional mean-variance relationship, fully parametric estimation is not possible in such models. Instead, estimating equations are formed based on first-moment (mean) and second-moment (covariance) assumptions for the marginal data. The approaches for modeling correlated nonnormal data via generalized estimating equations (GEE) fall into this category (see, for example, Liang and Zeger 1986; Zeger and Liang 1986).

Last updated: December 09, 2022