This section introduces the mathematical notation that the chapter uses to describe the generalized linear mixed model. For a description of the statistical details and sampling algorithms, see the section Details: BGLIMM Procedure.
First consider the simple normal linear model. The quantity of primary interest, , is called the response or outcome variable for the ith individual. The variable
is the
covariate vector for the fixed effects. The distribution of
given
is normal with a mean that is a linear function of
,
where is a
vector of regression coefficients (also known as fixed effects) and
is the noise with a variance
.
The normal linear model can be expanded to include random effects, and the model becomes a normal linear mixed model,
where is a
vector of random effects,
is a
matrix of covariates for the
, and
is the covariance matrix of the random effects
(
is a block diagonal matrix where each block is
).
When an individual i has repeated measurements, the random-effects model for outcome vector
is given by
where is
,
is an
matrix of fixed covariates,
is a
vector of regression coefficients (also known as fixed effects),
is a
vector of random effects,
is an
matrix of covariates for the
, and
is an
vector of random errors.
It is further assumed that
where is the covariance matrix of
(
is a block diagonal matrix where each block is
) and
is the covariance matrix of the residual errors for the ith subject (
is a block diagonal matrix where each block is
).
There are cases where the relationship between the design matrix ( and
) and the expectation of the response is not linear, or where the distribution for the response is far from normal, even after transformation of the data. The class of generalized linear mixed models unifies the approaches that you need in order to analyze data in those cases. Let
be the collection of all
; and let
and
be the collection of all
and
, respectively. A generalized linear mixed model consists of the following:
the link function that relates the linear predictor to the mean of the outcome via a monotone link function,
where is a differentiable monotone link function and
is its inverse
a response distribution in the exponential family of distributions. The distribution can also depend on a scale parameter, .
The conditional distribution of the response variable, given , is a member of the exponential family of distributions, including the normal distribution. You specify the distribution by using the DIST= option in the MODEL statement and specify the link function
by using the LINK= option.
The BGLIMM procedure distinguishes two types of covariance structure: the "G-side" and the "R-side." The G-side matrix is the covariance matrix of the random effects; the R-side matrix is the covariance matrix of the residuals. Models without G-side effects are also known as marginal (or population-averaged) models.
The columns of are constructed from effects that are listed on the right side in the MODEL statement. Columns of
and the G-side covariance matrix
are constructed from the RANDOM statement. The R-side covariance matrix
is constructed from the REPEATED statement, or from the RANDOM statement with the RESIDUAL option.
By default, the matrix is the scaled identity matrix,
. The
scale parameter
is set to 1 if the distribution does not have a scale parameter, such as in the case of the binary, binomial, Poisson, and exponential distributions.
For the normal distribution, for which you can specify various types of covariance structure for , use the REPEATED statement. For example, to specify that the
Time effect for each patient is an R-side effect with a
first-order autoregressive covariance structure, use the following statement:
repeated Time / type=ar(1) subject=Patient;
Unknown quantities subject to estimation are the fixed-effects parameter vector , the random-effects parameter
, and the
covariance parameters that constitute all unknowns in
and
.