The GEE Procedure

Generalized Estimating Equations

The marginal model is commonly used in analyzing longitudinal data when the population-averaged effect is of interest. To estimate the regression parameters in the marginal model, Liang and Zeger (1986) proposed the generalized estimating equations method, which is widely used.

Suppose y Subscript i j Baseline comma j equals 1 comma ellipsis comma n Subscript i Baseline comma i equals 1 comma ellipsis comma upper K, represent the jth response of the ith subject, which has a vector of covariates x Subscript i j. There are n Subscript i measurements on subject i, and the maximum number of measurements per subject is T.

Suppose the responses of the ith subject be bold upper Y Subscript i Baseline equals left-bracket y Subscript i Baseline 1 Baseline comma ellipsis comma y Subscript i n Sub Subscript i Subscript Baseline right-bracket prime with corresponding means bold-italic mu Subscript i Baseline equals left-bracket mu Subscript i Baseline 1 Baseline comma ellipsis comma mu Subscript i n Sub Subscript i Subscript Baseline right-bracket prime. For generalized linear models, the marginal mean mu Subscript i j of the response y Subscript i j is related to a linear predictor through a link function g left-parenthesis mu Subscript i j Baseline right-parenthesis equals bold x prime Subscript i j Baseline bold-italic beta, and the variance of y Subscript i j depends on the mean through a variance function v left-parenthesis mu Subscript i j Baseline right-parenthesis.

An estimate of the parameter bold-italic beta in the marginal model can be obtained by solving the generalized estimating equations,

bold upper S left-parenthesis bold-italic beta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts StartFraction partial-differential bold-italic mu prime Subscript i Over partial-differential bold-italic beta EndFraction bold upper V Subscript i Superscript negative 1 Baseline left-parenthesis bold upper Y Subscript i Baseline minus bold-italic mu Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis equals bold 0

where bold upper V Subscript i is the working covariance matrix of bold upper Y Subscript i.

Only the mean and the covariance of bold upper Y Subscript i are required in the GEE method; a full specification of the joint distribution of the correlated responses is not needed. This is particularly convenient because the joint distribution for noncontinuous responses involves high-order associations and is complicated to specify. Moreover, the regression parameter estimates are consistent even when the working covariance is incorrectly specified. Because of these properties, the GEE method is popular in situations where the marginal effect is of interest and the responses are not continuous. However, the GEE approach can lead to biased estimates when missing responses depend on previous responses. The weighted GEE method, which is described in the section Weighted Generalized Estimating Equations under the MAR Assumption, can provide unbiased estimates.

Working Correlation Matrix

Suppose bold upper R Subscript i Baseline left-parenthesis bold-italic alpha right-parenthesis is an n Subscript i Baseline times n Subscript i "working" correlation matrix that is fully specified by the vector of parameters bold-italic alpha. The covariance matrix of bold upper Y Subscript i is modeled as

bold upper V Subscript i Baseline equals phi bold upper A Subscript i Superscript one-half Baseline bold upper W Subscript i Superscript negative one-half Baseline bold upper R left-parenthesis bold-italic alpha right-parenthesis bold upper W Subscript i Superscript negative one-half Baseline bold upper A Subscript i Superscript one-half

where bold upper A Subscript i is an n Subscript i Baseline times n Subscript i diagonal matrix whose jth diagonal element is v left-parenthesis mu Subscript i j Baseline right-parenthesis and bold upper W Subscript i is an n Subscript i Baseline times n Subscript i diagonal matrix whose jth diagonal is w Subscript i j, where w Subscript i j is a weight variable that is specified in the WEIGHT statement. If there is no WEIGHT statement, w Subscript i j Baseline equals 1 for all i and j. If bold upper R Subscript i Baseline left-parenthesis bold-italic alpha right-parenthesis is the true correlation matrix of bold upper Y Subscript i, then bold upper V Subscript i is the true covariance matrix of bold upper Y Subscript i.

In practice, the working correlation matrix is usually unknown and must be estimated. It is estimated in the iterative fitting process by using the current value of the parameter vector bold-italic beta to compute appropriate functions of the Pearson residual:

e Subscript i j Baseline equals StartFraction y Subscript i j Baseline minus mu Subscript i j Baseline Over StartRoot v left-parenthesis mu Subscript i j Baseline right-parenthesis slash w Subscript i j Baseline EndRoot EndFraction

If you specify the working correlation matrix as bold upper R 0 equals bold upper I, which is the identity matrix, the GEE reduces to the independence estimating equation.

Table 13 shows the working correlation structures that are supported by the GEE procedure and the estimators that are used to estimate the working correlations.

Table 13: Working Correlation Structures and Estimators

Working Correlation Structure Estimator
Fixed
Corr left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis equals r Subscript j k Baselinewhere r Subscript j k is the jkth element of a constant, user-specified correlation matrix bold upper R 0 The working correlation is not estimated in this case.
Independent
Corr left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column j equals k 2nd Row 1st Column 0 2nd Column j not-equals k EndLayout The working correlation is not estimated in this case.
m-dependent
Corr left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i comma j plus t Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column t equals 0 2nd Row 1st Column alpha Subscript t Baseline 2nd Column t equals 1 comma 2 comma ellipsis comma m 3rd Row 1st Column 0 2nd Column t greater-than m EndLayout ModifyingAbove alpha With caret Subscript t Baseline equals StartFraction 1 Over left-parenthesis upper K Subscript t Baseline minus p right-parenthesis phi EndFraction sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript j less-than-or-equal-to n Subscript i Baseline minus t Endscripts e Subscript i j Baseline e Subscript i comma j plus tupper K Subscript t Baseline equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts left-parenthesis n Subscript i Baseline minus t right-parenthesis
Exchangeable
Corr left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column j equals k 2nd Row 1st Column alpha 2nd Column j not-equals k EndLayout ModifyingAbove alpha With caret equals StartFraction 1 Over left-parenthesis upper N Superscript asterisk Baseline minus p right-parenthesis phi EndFraction sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript j less-than k Endscripts e Subscript i j Baseline e Subscript i kupper N Superscript asterisk Baseline equals 0.5 sigma-summation Underscript i equals 1 Overscript upper K Endscripts n Subscript i Baseline left-parenthesis n Subscript i Baseline minus 1 right-parenthesis
Unstructured
Corr left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column j equals k 2nd Row 1st Column alpha Subscript j k Baseline 2nd Column j not-equals k EndLayout ModifyingAbove alpha With caret Subscript j k Baseline equals StartFraction 1 Over left-parenthesis upper K minus p right-parenthesis phi EndFraction sigma-summation Underscript i equals 1 Overscript upper K Endscripts e Subscript i j Baseline e Subscript i k
Autoregressive AR(1)
normal upper C normal o normal r normal r left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i comma j plus t Baseline right-parenthesis equals alpha Superscript tfor t equals 0 comma 1 comma 2 comma ellipsis comma n Subscript i Baseline minus j ModifyingAbove alpha With caret equals StartFraction 1 Over left-parenthesis upper K 1 minus p right-parenthesis phi EndFraction sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript j less-than-or-equal-to n Subscript i Baseline minus 1 Endscripts e Subscript i j Baseline e Subscript i comma j plus 1 upper K 1 equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts left-parenthesis n Subscript i Baseline minus 1 right-parenthesis


Dispersion Parameter

The dispersion parameter phi is estimated by

ModifyingAbove phi With caret equals StartFraction 1 Over upper N minus p EndFraction sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts e Subscript i j Superscript 2

where upper N equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts n Subscript i is the total number of measurements and p is the number of regression parameters.

The square root of ModifyingAbove phi With caret is reported by PROC GEE as the scale parameter in the "Parameter Estimates for Response Model with Model-Based Standard Error" output table. If a fixed scale parameter is specified by using the NOSCALE option in the MODEL statement, then the fixed value is used in estimating the model-based covariance matrix and standard errors.

Quasi-likelihood Information Criterion

The quasi-likelihood information criterion (QIC) was developed by Pan (2001) as a modification of Akaike’s information criterion (AIC) to apply to models fit by the GEE approach.

Define the quasi-likelihood under the independent working correlation assumption, evaluated with the parameter estimates under the working correlation of interest as

upper Q left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis comma phi right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts sigma-summation Underscript j equals 1 Overscript n Subscript i Baseline Endscripts upper Q left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis comma phi semicolon left-parenthesis upper Y Subscript i j Baseline comma bold upper X Subscript i j Baseline right-parenthesis right-parenthesis

where the quasi-likelihood contribution of the jth observation in the ith cluster is defined in the section Quasi-likelihood Functions and ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis are the parameter estimates that are obtained by using the GEE approach with the working correlation of interest R.

QIC is defined as

normal upper Q normal upper I normal upper C left-parenthesis upper R right-parenthesis equals minus 2 upper Q left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis comma phi right-parenthesis plus 2 normal t normal r normal a normal c normal e left-parenthesis ModifyingAbove normal upper Omega With caret Subscript upper I Baseline ModifyingAbove upper V With caret Subscript upper R Baseline right-parenthesis

where ModifyingAbove upper V With caret Subscript upper R is the robust covariance estimate and ModifyingAbove normal upper Omega With caret Subscript upper I is the inverse of the model-based covariance estimate under the independent working correlation assumption, evaluated at ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis, which are the parameter estimates that are obtained by using the GEE approach with the working correlation of interest R.

PROC GEE also computes an approximation to normal upper Q normal upper I normal upper C left-parenthesis upper R right-parenthesis, which is defined by Pan (2001) as

normal upper Q normal upper I normal upper C Subscript u Baseline left-parenthesis upper R right-parenthesis equals minus 2 upper Q left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis upper R right-parenthesis comma phi right-parenthesis plus 2 p

where p is the number of regression parameters.

Pan (2001) notes that QIC is appropriate for selecting regression models and working correlations, whereas normal upper Q normal upper I normal upper C Subscript u is appropriate only for selecting regression models.

Quasi-likelihood Functions

See McCullagh and Nelder (1989) and Hardin and Hilbe (2003) for discussions of quasi-likelihood functions. The contribution of observation j in cluster i to the quasi-likelihood function that is evaluated at the regression parameters bold-italic beta is expressed by upper Q left-parenthesis bold-italic beta comma phi semicolon left-parenthesis upper Y Subscript i j Baseline comma bold upper X Subscript i j Baseline right-parenthesis right-parenthesis equals StartFraction upper Q Subscript i j Baseline Over phi EndFraction, where upper Q Subscript i j is defined in the following list. These definitions are used in the computation of the quasi-likelihood information criteria (QIC) for goodness of fit of models that are fit by the GEE approach. The w Subscript i j are prior weights, if any, that are specified in the WEIGHT or FREQ statement. Note that the definition of the quasi-likelihood for the negative binomial differs from that given in McCullagh and Nelder (1989). The definition used here allows the negative binomial quasi-likelihood to approach the Poisson as k right-arrow 0.

  • Normal:

    upper Q Subscript i j Baseline equals minus one-half w Subscript i j Baseline left-parenthesis y Subscript i j Baseline minus mu Subscript i j Baseline right-parenthesis squared
  • Inverse Gaussian:

    upper Q Subscript i j Baseline equals StartFraction w Subscript i j Baseline left-parenthesis mu Subscript i j Baseline minus .5 y Subscript i j Baseline right-parenthesis Over mu Subscript i j Superscript 2 Baseline EndFraction
  • Gamma:

    upper Q Subscript i j Baseline equals minus w Subscript i j Baseline left-bracket StartFraction y Subscript i j Baseline Over mu Subscript i j Baseline EndFraction plus log left-parenthesis mu Subscript i j Baseline right-parenthesis right-bracket
  • Negative binomial:

    upper Q Subscript i j Baseline equals w Subscript i j Baseline left-bracket log normal upper Gamma left-parenthesis y Subscript i j Baseline plus StartFraction 1 Over k EndFraction right-parenthesis minus log normal upper Gamma left-parenthesis StartFraction 1 Over k EndFraction right-parenthesis plus y Subscript i j Baseline log left-parenthesis StartFraction k mu Subscript i j Baseline Over 1 plus k mu Subscript i j Baseline EndFraction right-parenthesis plus StartFraction 1 Over k EndFraction log left-parenthesis StartFraction 1 Over 1 plus k mu Subscript i j Baseline EndFraction right-parenthesis right-bracket
  • Poisson:

    upper Q Subscript i j Baseline equals w Subscript i j Baseline left-parenthesis y Subscript i j Baseline log left-parenthesis mu Subscript i j Baseline right-parenthesis minus mu Subscript i j Baseline right-parenthesis
  • Binomial:

    upper Q Subscript i j Baseline equals w Subscript i j Baseline left-bracket r Subscript i j Baseline log left-parenthesis p Subscript i j Baseline right-parenthesis plus left-parenthesis n Subscript i j Baseline minus r Subscript i j Baseline right-parenthesis log left-parenthesis 1 minus p Subscript i j Baseline right-parenthesis right-bracket
  • Multinomial (s categories):

    upper Q Subscript i j Baseline equals w Subscript i j Baseline sigma-summation Underscript k equals 1 Overscript s Endscripts y Subscript i j k Baseline log left-parenthesis mu Subscript i j k Baseline right-parenthesis
Last updated: December 09, 2022