The GEE Procedure

Alternating Logistic Regression

If the responses are binary (that is, they take only two values), then there is an alternative method to account for the association among the measurements. The alternating logistic regressions (ALR) algorithm of Carey, Zeger, and Diggle (1993) models the association between pairs of responses by using log odds ratios instead of using correlations, as ordinary GEEs do. The ALR algorithm of Heagerty and Zeger (1996) extends the method to GEEs that have ordinal multinomial responses (that is, they fall into one of upper C ordered categories).

ALR for Binary Data

For binary data, the correlation between the jth and kth response is, by definition,

normal upper C normal o normal r normal r left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis equals StartFraction probability left-parenthesis upper Y Subscript i j Baseline equals 1 comma upper Y Subscript i k Baseline equals 1 right-parenthesis minus mu Subscript i j Baseline mu Subscript i k Baseline Over StartRoot mu Subscript i j Baseline left-parenthesis 1 minus mu Subscript i j Baseline right-parenthesis mu Subscript i k Baseline left-parenthesis 1 minus mu Subscript i k Baseline right-parenthesis EndRoot EndFraction

The joint probability in the numerator satisfies the following bounds, by elementary properties of probability, because mu Subscript i j Baseline equals probability left-parenthesis upper Y Subscript i j Baseline equals 1 right-parenthesis:

max left-parenthesis 0 comma mu Subscript i j Baseline plus mu Subscript i k Baseline minus 1 right-parenthesis less-than-or-equal-to probability left-parenthesis upper Y Subscript i j Baseline equals 1 comma upper Y Subscript i k Baseline equals 1 right-parenthesis less-than-or-equal-to min left-parenthesis mu Subscript i j Baseline comma mu Subscript i k Baseline right-parenthesis

Therefore, the correlation is constrained to be within limits that depend in a complicated way on the means of the data.

The odds ratio, defined as

normal upper O normal upper R left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis equals StartFraction probability left-parenthesis upper Y Subscript i j Baseline equals 1 comma upper Y Subscript i k Baseline equals 1 right-parenthesis probability left-parenthesis upper Y Subscript i j Baseline equals 0 comma upper Y Subscript i k Baseline equals 0 right-parenthesis Over probability left-parenthesis upper Y Subscript i j Baseline equals 1 comma upper Y Subscript i k Baseline equals 0 right-parenthesis probability left-parenthesis upper Y Subscript i j Baseline equals 0 comma upper Y Subscript i k Baseline equals 1 right-parenthesis EndFraction

is not constrained by the means and is preferred, in some cases, to correlations for binary data.

The ALR algorithm seeks to model the logarithm of the odds ratio, gamma Subscript i j k Baseline equals log left-parenthesis normal upper O normal upper R left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis right-parenthesis, as

gamma Subscript i j k Baseline equals bold z prime Subscript i j k Baseline bold-italic alpha

where bold-italic alpha is a q times 1 vector of regression parameters and bold z Subscript i j k is a fixed, specified vector of coefficients.

The parameter gamma Subscript i j k can take any value in left-parenthesis negative normal infinity comma normal infinity right-parenthesis, with gamma Subscript i j k Baseline equals 0 corresponding to no association.

The log odds ratio, when modeled in this way with a regression model, can take different values in subgroups defined by bold z Subscript i j k. For example, bold z Subscript i j k can define subgroups within clusters, or it can define "block effects" between clusters.

You specify a GEE model for binary data that uses log odds ratios by specifying a model for the mean, as in ordinary GEEs, and by specifying a model for the log odds ratios. You can use any of the link functions appropriate for binary data in the model for the mean, such as logistic, probit, or complementary log-log.

ALR for Ordinal Multinomial Data

For ordinal multinomial data, let upper O Subscript i j, i equals 1 comma ellipsis comma upper K, j equals 1 comma ellipsis comma n Subscript i Baseline, denote the jth measurement on the ith subject. To apply the ALR algorithm, the responses upper O Subscript i j are represented by a vector bold upper Y Subscript i j Baseline equals left-bracket upper Y Subscript i j Baseline 1 Baseline comma ellipsis comma upper Y Subscript i j upper C minus 1 Baseline right-bracket prime of cumulative indicator variables upper Y Subscript i j c Baseline equals normal upper I left-parenthesis upper O Subscript i comma j Baseline less-than-or-equal-to c right-parenthesis. You model the cumulative probabilities mu Subscript i j c Baseline equals upper E left-parenthesis upper Y Subscript i j c Baseline right-parenthesis by using a cumulative link function,

g left-parenthesis mu Subscript i j c Baseline right-parenthesis equals bold-italic beta Subscript c Baseline plus bold x prime Subscript i j Baseline bold-italic beta comma for c equals 1 comma ellipsis comma upper C minus 1

where beta 1 comma beta 2 comma ellipsis comma beta Subscript upper C minus 1 Baseline are increasing intercept terms that depend only on the level c. Let the binary vector that represents the responses of the ith subject be bold upper Y Subscript i Baseline equals left-bracket bold upper Y Subscript i Baseline 1 Baseline comma ellipsis comma bold upper Y Subscript i n Sub Subscript i Subscript Baseline right-bracket prime with corresponding means bold-italic mu Subscript i Baseline equals left-bracket mu Subscript i Baseline 1 Baseline comma ellipsis comma mu Subscript i n Sub Subscript i Subscript Baseline right-bracket prime.

The log odds ratio between two indicator variables upper Y Subscript i j c 1 and upper Y Subscript i k c 2 is modeled as

gamma Subscript i left-parenthesis j k right-parenthesis left-parenthesis c 1 c 2 right-parenthesis Baseline equals log left-parenthesis normal upper O normal upper R left-parenthesis upper Y Subscript i j c 1 Baseline comma upper Y Subscript i k c 2 Baseline right-parenthesis right-parenthesis equals bold z prime Subscript i left-parenthesis j k right-parenthesis left-parenthesis c 1 c 2 right-parenthesis Baseline bold-italic alpha

for q times 1 regression parameters bold-italic alpha and fixed coefficients bold z Subscript i left-parenthesis j k right-parenthesis left-parenthesis c 1 c 2 right-parenthesis. As in Carey, Zeger, and Diggle (1993), bold-italic alpha then provides a vector of regression parameters in a logistic model for the conditional expectation xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis c 1 c 2 right-parenthesis Baseline equals upper E left-parenthesis upper Y Subscript i j c 1 Baseline vertical-bar upper Y Subscript i k c 2 Baseline right-parenthesis. To estimate bold-italic alpha, the conditional expectation is considered for all pairs upper Y Subscript i j c 1 and upper Y Subscript i k c 2 with j less-than k. Let

StartLayout 1st Row  bold-italic xi Subscript i left-parenthesis j k right-parenthesis Baseline equals left-bracket xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis 11 right-parenthesis Baseline comma xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis 12 right-parenthesis Baseline comma ellipsis comma xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis 21 right-parenthesis Baseline comma ellipsis comma xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis upper C minus 1 comma upper C minus 1 right-parenthesis Baseline right-bracket Superscript prime Baseline 2nd Row  bold-italic xi Subscript i Baseline equals left-bracket bold-italic xi Subscript i left-parenthesis 12 right-parenthesis Baseline comma bold-italic xi Subscript i left-parenthesis 13 right-parenthesis Baseline comma ellipsis comma bold-italic xi Subscript i left-parenthesis 23 right-parenthesis Baseline comma ellipsis comma bold-italic xi Subscript i left-parenthesis n Sub Subscript i Subscript minus 1 n Sub Subscript i Subscript right-parenthesis Baseline right-bracket Superscript prime Baseline 3rd Row  bold upper Y Subscript i Superscript asterisk Baseline equals left-bracket ModifyingAbove upper Y Subscript i Baseline 1 Baseline circled-times e Subscript upper C minus 1 Baseline comma ellipsis comma upper Y Subscript i Baseline 1 Baseline circled-times e Subscript upper C minus 1 Baseline With top-brace Overscript n Subscript i Baseline minus 1 Endscripts comma ModifyingBelow upper Y Subscript i Baseline 2 Baseline circled-times e Subscript upper C minus 1 Baseline comma ellipsis comma upper Y Subscript i Baseline 2 Baseline circled-times e Subscript upper C minus 1 Baseline With bottom-brace Underscript n Subscript i Baseline minus 2 Endscripts comma ellipsis comma ModifyingAbove upper Y Subscript i n Sub Subscript i Subscript minus 1 Baseline circled-times e Subscript upper C minus 1 Baseline With top-brace Overscript 1 Endscripts right-bracket prime EndLayout

where circled-times denotes the Kronecker product and e Subscript l denotes a vector of dimension l composed of ones. The difference bold upper Y Subscript i Superscript asterisk Baseline minus bold-italic xi Subscript i represents the residuals of the model for the conditional expectation.

For both binary and multinomial data, the ALR estimates for bold-italic beta and bold-italic alpha are the simultaneous solutions to the estimating equations

StartLayout 1st Row  bold upper S 1 left-parenthesis bold-italic beta comma bold-italic alpha right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts StartFraction partial-differential bold-italic mu Subscript i Baseline Over partial-differential bold-italic beta EndFraction prime bold upper V Subscript i Baseline 11 Superscript negative 1 Baseline left-parenthesis bold upper Y Subscript i Baseline minus bold-italic mu Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis equals bold 0 2nd Row  bold upper S 2 left-parenthesis bold-italic beta comma bold-italic alpha right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts StartFraction partial-differential bold-italic xi Subscript i Baseline Over partial-differential bold-italic alpha EndFraction prime bold upper V Subscript i Baseline 33 Superscript negative 1 Baseline left-parenthesis bold upper Y Subscript i Superscript asterisk Baseline minus bold-italic xi Subscript i Baseline right-parenthesis equals bold 0 EndLayout

where bold upper V Subscript i Baseline 11 Baseline equals cov left-parenthesis bold upper Y Subscript i Baseline right-parenthesis and bold upper V Subscript i Baseline 33 Baseline equals diag left-bracket bold-italic xi Subscript i Baseline left-parenthesis 1 minus bold-italic xi Subscript i Baseline right-parenthesis right-bracket. The fitting algorithm alternates between a GEE step to update the model for the mean and a logistic regression step to update the log odds ratio model. Upon convergence, the ALR algorithm provides estimates of the regression parameters for the mean, bold-italic beta; the regression parameters for the log odds ratios, bold-italic alpha; their standard errors; and their covariances.

Specifying Log Odds Ratio Models

Specifying a regression model for the log odds ratio requires you to specify the rows of the matrix bold z. For binary data, there is a row bold z Subscript i j k for each cluster i and within-cluster pair left-parenthesis j comma k right-parenthesis. For ordinal multinomial data, there is a row bold z Subscript i left-parenthesis j k right-parenthesis left-parenthesis c 1 c 2 right-parenthesis for each cluster i, within-cluster pair left-parenthesis j comma k right-parenthesis, and choice of levels left-parenthesis c 1 comma c 2 right-parenthesis.

For ordinal multinomial data, the GEE procedure supports only the ALR method that uses a fully exchangeable regression structure for the log odds ratio. In a fully exchangeable model, the log odds ratio is constant for all clusters i, within-cluster pair left-parenthesis j comma k right-parenthesis, and levels left-parenthesis c 1 comma c 2 right-parenthesis. You select a fully exchangeable model for the log odds ratio by specifying LOGOR=EXCH.

For binary data, the GEE procedure provides several methods of specifying bold z Subscript i j k. You apply these methods by specifying LOGOR=keyword and associated options in the REPEATED statement. The supported keywords and the resulting log odds ratio models are described as follows:

EXCH

specifies exchangeable log odds ratios. In this model, the log odds ratio is a constant for all clusters i and pairs left-parenthesis j comma k right-parenthesis. The parameter alpha is the common log odds ratio.

bold z Subscript i j k Baseline equals 1 for all i comma j comma k
FULLCLUST

specifies fully parameterized clusters. Each cluster is parameterized in the same way, and there is a parameter for each unique pair within clusters. If a complete cluster is of size n, then there are StartFraction n left-parenthesis n minus 1 right-parenthesis Over 2 EndFraction parameters in the vector bold-italic alpha. For example, if a full cluster is of size 4, then there are StartFraction 4 times 3 Over 2 EndFraction equals 6 parameters, and the bold z matrix is of the form

bold upper Z equals Start 6 By 6 Matrix 1st Row 1st Column 1 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 1 5th Column 0 6th Column 0 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 1 6th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 1 EndMatrix

The elements of bold-italic alpha correspond to log odds ratios for cluster pairs in the following order:

Pair Parameter
(1,2) Alpha1
(1,3) Alpha2
(1,4) Alpha3
(2.3) Alpha4
(2,4) Alpha5
(3,4) Alpha6

LOGORVAR(variable)

specifies log odds ratios by cluster. The argument variable is a variable name that defines the "block effects" between clusters. The log odds ratios are constant within clusters, but they take a different value for each different value of the variable. For example, if Center is a variable in the input data set that takes a different value for k treatment centers, then when you specify LOGOR=LOGORVAR(Center), you get a model that has different log odds ratios for each of the k centers, constant within center.

NESTK

specifies k-nested log odds ratios. You must also specify the SUBCLUST=variable option to define subclusters within clusters. Within each cluster, PROC GEE computes a log odds ratio parameter for pairs that have the same value of variable for both members of the pair and one log odds ratio parameter for each unique combination of different values of variable.

NEST1

specifies 1-nested log odds ratios. You must also specify the SUBCLUST=variable option to define subclusters within clusters. There are two log odds ratio parameters for this model. Pairs that have the same value of variable correspond to one parameter; pairs that have different values of variable correspond to the other parameter. For example, if patients are clustered by hospital and subclusters are the wards within those hospitals, then the outcomes of patients within the same ward have one log odds ratio parameter, and the outcomes of patients from different wards have the other parameter.

ZFULL

specifies the full bold z matrix. You must also specify a SAS data set that contains the bold z matrix by using the ZDATA=data-set-name option. Each observation in the data set corresponds to one row of the bold z matrix. You must specify the ZDATA data set as if all clusters are complete—that is, as if all clusters are the same size and there are no missing observations. The ZDATA data set has upper K left-bracket n Subscript m a x Baseline left-parenthesis n Subscript m a x Baseline minus 1 right-parenthesis slash 2 right-bracket observations, where K is the number of clusters and n Subscript m a x is the maximum cluster size. If the members of cluster i are ordered as 1 comma 2 comma ellipsis comma n, then the rows of the bold z matrix must be specified for pairs in the order left-parenthesis 1 comma 2 right-parenthesis comma left-parenthesis 1 comma 3 right-parenthesis comma ellipsis comma left-parenthesis 1 comma n right-parenthesis comma left-parenthesis 2 comma 3 right-parenthesis comma ellipsis comma left-parenthesis 2 comma n right-parenthesis comma ellipsis comma left-parenthesis n minus 1 comma n right-parenthesis. The variables that you specify in the REPEATED statement for the SUBJECT effect must also be present in the ZDATA= data set to identify clusters. You must specify variables in the data set that define the columns of the bold z matrix by using the ZROW=variable-list option. If there are q columns (q variables in variable-list), then there are q log odds ratio parameters. You can optionally specify variables that indicate the cluster pairs corresponding to each row of the bold z matrix by using the YPAIR=(variable1, variable2) option. If you specify this option, the data from the ZDATA data set are sorted within each cluster by variable1 and variable2. See Example 50.4 for an example of specifying a full bold z matrix.

ZREP

specifies a replicated bold z matrix. You specify bold z matrix data exactly as you do for the ZFULL option case, except that you specify only one complete cluster. The bold z matrix for the one cluster is replicated for each cluster. The number of observations in the ZDATA data set is StartFraction n Subscript m a x Baseline left-parenthesis n Subscript m a x Baseline minus 1 right-parenthesis Over 2 EndFraction, where n Subscript m a x is the size of a complete cluster (a cluster with no missing observations).

ZREP(matrix)

specifies direct input of the replicated bold z matrix. You specify the bold z matrix for one cluster by using the syntax LOGOR=ZREP ( left-parenthesis y Subscript j Baseline y Subscript k Baseline right-parenthesis z Subscript j k Baseline 1 Baseline z Subscript j k Baseline 2 Baseline midline-horizontal-ellipsis z Subscript j k q Baseline comma midline-horizontal-ellipsis ), where y Subscript j and y Subscript k are numbers that represent a pair of observations from the ith cluster and the values z Subscript j k Baseline 1 Baseline comma z Subscript j k Baseline 2 Baseline comma ellipsis comma z Subscript j k q Baseline make up the corresponding row bold z Subscript i j k of the bold z matrix. The number of specified rows is StartFraction n Subscript m a x Baseline left-parenthesis n Subscript m a x Baseline minus 1 right-parenthesis Over 2 EndFraction, where n Subscript m a x is the size of a complete cluster (a cluster with no missing observations). For example,

logor =  zrep((1 2) 1 0,
              (1 3) 1 0,
              (1 4) 1 0,
              (2 3) 1 1,
              (2 4) 1 1,
              (3 4) 1 1)

specifies the StartFraction 4 times 3 Over 2 EndFraction equals 6 rows of the bold z matrix for a cluster of size 4 with q = 2 log odds ratio parameters. The log odds ratio for the pairs (1 2), (1 3), (1 4) is alpha 1, and the log odds ratio for the pairs (2 3), (2 4), (3 4) is alpha 1 plus alpha 2.

Last updated: December 09, 2022