First consider the simplest model: a normal linear model. The quantity of primary interest, , is called the response or outcome variable for the ith individual. The variable
is the
covariate vector for the fixed effects. The distribution of
given
is normal with a mean that is a linear function of
,
where is a
vector of regression coefficients (also known as fixed effects) and
is the noise with a variance
.
The normal linear model can be expanded to include random effects, and the model becomes a normal linear mixed model,
where is a
vector of random effects,
is an
matrix of covariates for the
, and
is the covariance matrix of the random effects
(
is a block diagonal matrix where each block is
).
When an individual i has repeated measurements, the random-effects model for outcome vector
is given by
where is
,
is an
matrix of fixed covariates,
is a
vector of regression coefficients (also known as fixed effects),
is a
vector of random effects,
is an
matrix of covariates for the
, and
is an
vector of random errors.
It is further assumed that
where is the covariance matrix of
(
is a block diagonal matrix where each block is
) and
is the covariance matrix of the residual errors for the ith subject (
is a block diagonal matrix where each block is
).
There are cases where the relationship between the design matrix ( and
) and the expectation of the response is not linear, or where the distribution for the response is far from normal, even after transformation of the data. The class of generalized linear mixed models unifies the approaches that you need in order to analyze data in those cases. Let
be the collection of all
, and let
and
be the collection of all
and
, respectively. A generalized linear mixed model consists of the following:
the link function that relates the linear predictor to the mean of the outcome via a monotone link function,
where is a differentiable monotone link function and
is its inverse.
a response distribution in the exponential family of distributions. The distribution can also depend on a scale parameter, .
A density or mass function in the exponential family can be written as
for some functions and
. The parameter
is called the natural (canonical) parameter. The parameter
is a
scale parameter, and it is not present in all exponential family distributions. For example, in logistic regression and Poisson regression,
.
The mean and variance of the data are related to the components of the density, ,
, where primes denote first and second derivatives. If you express
as a function of
, the relationship is known as the natural link function or the canonical link function. In other words, modeling data by using a canonical link assumes that
; the effect contributions are additive on the canonical scale. The second derivative of
, expressed as a function of
, is the variance function of the generalized linear model,
. Note that because of this relationship, the distribution determines the variance function and the canonical link function. However, you cannot proceed in the opposite direction.
Likelihood-based inference is based on the marginal likelihood in which the random effects are integrated out. The integration requires numerical methods, such as Gaussian quadrature methods (which can be computationally costly) or Laplace methods (which are faster but not as accurate). The Bayesian approach estimates the joint distribution of all parameters in the model, and it is made possible by the Markov chain Monte Carlo (MCMC) methods. The presence of the random-effects parameters adds an extra sampling step to the Gibbs algorithm, thus eliminating the need to numerically integrate out
to make inferences about
. The MCMC methods produce marginal distribution estimates of all fixed-effects parameters, include the
and
covariance matrices, making estimation convenient.