The GEE Procedure

Generalized Estimating Equations

The marginal model is commonly used in analyzing longitudinal data when the population-averaged effect is of interest. To estimate the regression parameters in the marginal model, Liang and Zeger (1986) proposed the generalized estimating equations method, which is widely used.

Suppose , represent the jth response of the ith subject, which has a vector of covariates . There are measurements on subject i, and the maximum number of measurements per subject is T.

Suppose the responses of the ith subject be with corresponding means . For generalized linear models, the marginal mean of the response is related to a linear predictor through a link function , and the variance of depends on the mean through a variance function .

An estimate of the parameter in the marginal model can be obtained by solving the generalized estimating equations,

bold upper S left-parenthesis bold-italic beta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts StartFraction partial-differential bold-italic mu prime Subscript i Over partial-differential bold-italic beta EndFraction bold upper V Subscript i Superscript negative 1 Baseline left-parenthesis bold upper Y Subscript i Baseline minus bold-italic mu Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis equals bold 0

where is the working covariance matrix of .

Only the mean and the covariance of are required in the GEE method; a full specification of the joint distribution of the correlated responses is not needed. This is particularly convenient because the joint distribution for noncontinuous responses involves high-order associations and is complicated to specify. Moreover, the regression parameter estimates are consistent even when the working covariance is incorrectly specified. Because of these properties, the GEE method is popular in situations where the marginal effect is of interest and the responses are not continuous. However, the GEE approach can lead to biased estimates when missing responses depend on previous responses. The weighted GEE method, which is described in the section Weighted Generalized Estimating Equations under the MAR Assumption, can provide unbiased estimates.

Working Correlation Matrix

Suppose is an "working" correlation matrix that is fully specified by the vector of parameters . The covariance matrix of is modeled as

bold upper V Subscript i Baseline equals phi bold upper A Subscript i Superscript one-half Baseline bold upper W Subscript i Superscript negative one-half Baseline bold upper R left-parenthesis bold-italic alpha right-parenthesis bold upper W Subscript i Superscript negative one-half Baseline bold upper A Subscript i Superscript one-half

where is an diagonal matrix whose jth diagonal element is and is an diagonal matrix whose jth diagonal is , where is a weight variable that is specified in the WEIGHT statement. If there is no WEIGHT statement, for all i and j. If is the true correlation matrix of , then is the true covariance matrix of .

In practice, the working correlation matrix is usually unknown and must be estimated. It is estimated in the iterative fitting process by using the current value of the parameter vector to compute appropriate functions of the Pearson residual:

e Subscript i j Baseline equals StartFraction y Subscript i j Baseline minus mu Subscript i j Baseline Over StartRoot v left-parenthesis mu Subscript i j Baseline right-parenthesis slash w Subscript i j Baseline EndRoot EndFraction

If you specify the working correlation matrix as , which is the identity matrix, the GEE reduces to the independence estimating equation.

Table 13 shows the working correlation structures that are supported by the GEE procedure and the estimators that are used to estimate the working correlations.

Table 13: Working Correlation Structures and Estimators

Working Correlation Structure	Estimator
Fixed
where is the jkth element of a constant, user-specified correlation matrix	The working correlation is not estimated in this case.
Independent
	The working correlation is not estimated in this case.
m-dependent

Exchangeable

Unstructured

Autoregressive AR(1)

Dispersion Parameter

The dispersion parameter is estimated by

ModifyingAbove phi With caret equals StartFraction 1 Over upper N minus p EndFraction sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts e Subscript i j Superscript 2

where is the total number of measurements and p is the number of regression parameters.

The square root of is reported by PROC GEE as the scale parameter in the "Parameter Estimates for Response Model with Model-Based Standard Error" output table. If a fixed scale parameter is specified by using the NOSCALE option in the MODEL statement, then the fixed value is used in estimating the model-based covariance matrix and standard errors.

Quasi-likelihood Information Criterion

The quasi-likelihood information criterion (QIC) was developed by Pan (2001) as a modification of Akaike’s information criterion (AIC) to apply to models fit by the GEE approach.

Define the quasi-likelihood under the independent working correlation assumption, evaluated with the parameter estimates under the working correlation of interest as

upper Q left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis comma phi right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts upper Q left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis comma phi semicolon left-parenthesis upper Y Subscript i j Baseline comma bold upper X Subscript i j Baseline right-parenthesis right-parenthesis

where the quasi-likelihood contribution of the jth observation in the ith cluster is defined in the section Quasi-likelihood Functions and are the parameter estimates that are obtained by using the GEE approach with the working correlation of interest R.

QIC is defined as

normal upper Q normal upper I normal upper C left-parenthesis upper R right-parenthesis equals minus 2 upper Q left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis comma phi right-parenthesis plus 2 normal t normal r normal a normal c normal e left-parenthesis ModifyingAbove normal upper Omega With caret Subscript upper I Baseline ModifyingAbove upper V With caret Subscript upper R Baseline right-parenthesis

where is the robust covariance estimate and is the inverse of the model-based covariance estimate under the independent working correlation assumption, evaluated at , which are the parameter estimates that are obtained by using the GEE approach with the working correlation of interest R.

PROC GEE also computes an approximation to , which is defined by Pan (2001) as

normal upper Q normal upper I normal upper C Subscript u Baseline left-parenthesis upper R right-parenthesis equals minus 2 upper Q left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis comma phi right-parenthesis plus 2 p

where p is the number of regression parameters.

Pan (2001) notes that QIC is appropriate for selecting regression models and working correlations, whereas is appropriate only for selecting regression models.

Quasi-likelihood Functions

See McCullagh and Nelder (1989) and Hardin and Hilbe (2003) for discussions of quasi-likelihood functions. The contribution of observation j in cluster i to the quasi-likelihood function that is evaluated at the regression parameters is expressed by , where is defined in the following list. These definitions are used in the computation of the quasi-likelihood information criteria (QIC) for goodness of fit of models that are fit by the GEE approach. The are prior weights, if any, that are specified in the WEIGHT or FREQ statement. Note that the definition of the quasi-likelihood for the negative binomial differs from that given in McCullagh and Nelder (1989). The definition used here allows the negative binomial quasi-likelihood to approach the Poisson as .

Normal:
Inverse Gaussian:
Gamma:
Negative binomial:
Poisson:
Binomial:
Multinomial (s categories):

Last updated: December 09, 2022