The GEE Procedure

Alternating Logistic Regression

If the responses are binary (that is, they take only two values), then there is an alternative method to account for the association among the measurements. The alternating logistic regressions (ALR) algorithm of Carey, Zeger, and Diggle (1993) models the association between pairs of responses by using log odds ratios instead of using correlations, as ordinary GEEs do. The ALR algorithm of Heagerty and Zeger (1996) extends the method to GEEs that have ordinal multinomial responses (that is, they fall into one of ordered categories).

ALR for Binary Data

For binary data, the correlation between the jth and kth response is, by definition,

normal upper C normal o normal r normal r left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis equals StartFraction probability left-parenthesis upper Y Subscript i j Baseline equals 1 comma upper Y Subscript i k Baseline equals 1 right-parenthesis minus mu Subscript i j Baseline mu Subscript i k Baseline Over StartRoot mu Subscript i j Baseline left-parenthesis 1 minus mu Subscript i j Baseline right-parenthesis mu Subscript i k Baseline left-parenthesis 1 minus mu Subscript i k Baseline right-parenthesis EndRoot EndFraction

The joint probability in the numerator satisfies the following bounds, by elementary properties of probability, because :

max left-parenthesis 0 comma mu Subscript i j Baseline plus mu Subscript i k Baseline minus 1 right-parenthesis less-than-or-equal-to probability left-parenthesis upper Y Subscript i j Baseline equals 1 comma upper Y Subscript i k Baseline equals 1 right-parenthesis less-than-or-equal-to min left-parenthesis mu Subscript i j Baseline comma mu Subscript i k Baseline right-parenthesis

Therefore, the correlation is constrained to be within limits that depend in a complicated way on the means of the data.

The odds ratio, defined as

normal upper O normal upper R left-parenthesis upper Y Subscript i j Baseline comma upper Y Subscript i k Baseline right-parenthesis equals StartFraction probability left-parenthesis upper Y Subscript i j Baseline equals 1 comma upper Y Subscript i k Baseline equals 1 right-parenthesis probability left-parenthesis upper Y Subscript i j Baseline equals 0 comma upper Y Subscript i k Baseline equals 0 right-parenthesis Over probability left-parenthesis upper Y Subscript i j Baseline equals 1 comma upper Y Subscript i k Baseline equals 0 right-parenthesis probability left-parenthesis upper Y Subscript i j Baseline equals 0 comma upper Y Subscript i k Baseline equals 1 right-parenthesis EndFraction

is not constrained by the means and is preferred, in some cases, to correlations for binary data.

The ALR algorithm seeks to model the logarithm of the odds ratio, , as

gamma Subscript i j k Baseline equals bold z prime Subscript i j k Baseline bold-italic alpha

where is a vector of regression parameters and is a fixed, specified vector of coefficients.

The parameter can take any value in , with corresponding to no association.

The log odds ratio, when modeled in this way with a regression model, can take different values in subgroups defined by . For example, can define subgroups within clusters, or it can define "block effects" between clusters.

You specify a GEE model for binary data that uses log odds ratios by specifying a model for the mean, as in ordinary GEEs, and by specifying a model for the log odds ratios. You can use any of the link functions appropriate for binary data in the model for the mean, such as logistic, probit, or complementary log-log.

ALR for Ordinal Multinomial Data

For ordinal multinomial data, let , , , denote the jth measurement on the ith subject. To apply the ALR algorithm, the responses are represented by a vector of cumulative indicator variables . You model the cumulative probabilities by using a cumulative link function,

g left-parenthesis mu Subscript i j c Baseline right-parenthesis equals bold-italic beta Subscript c Baseline plus bold x prime Subscript i j Baseline bold-italic beta comma for c equals 1 comma ellipsis comma upper C minus 1

where are increasing intercept terms that depend only on the level c. Let the binary vector that represents the responses of the ith subject be with corresponding means .

The log odds ratio between two indicator variables and is modeled as

gamma Subscript i left-parenthesis j k right-parenthesis left-parenthesis c 1 c 2 right-parenthesis Baseline equals log left-parenthesis normal upper O normal upper R left-parenthesis upper Y Subscript i j c 1 Baseline comma upper Y Subscript i k c 2 Baseline right-parenthesis right-parenthesis equals bold z prime Subscript i left-parenthesis j k right-parenthesis left-parenthesis c 1 c 2 right-parenthesis Baseline bold-italic alpha

for regression parameters and fixed coefficients . As in Carey, Zeger, and Diggle (1993), then provides a vector of regression parameters in a logistic model for the conditional expectation . To estimate , the conditional expectation is considered for all pairs and with . Let

StartLayout 1st Row bold-italic xi Subscript i left-parenthesis j k right-parenthesis Baseline equals left-bracket xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis 11 right-parenthesis Baseline comma xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis 12 right-parenthesis Baseline comma ellipsis comma xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis 21 right-parenthesis Baseline comma ellipsis comma xi Subscript i left-parenthesis j k right-parenthesis left-parenthesis upper C minus 1 comma upper C minus 1 right-parenthesis Baseline right-bracket Superscript prime Baseline 2nd Row bold-italic xi Subscript i Baseline equals left-bracket bold-italic xi Subscript i left-parenthesis 12 right-parenthesis Baseline comma bold-italic xi Subscript i left-parenthesis 13 right-parenthesis Baseline comma ellipsis comma bold-italic xi Subscript i left-parenthesis 23 right-parenthesis Baseline comma ellipsis comma bold-italic xi Subscript i left-parenthesis n Sub Subscript i Subscript minus 1 n Sub Subscript i Subscript right-parenthesis Baseline right-bracket Superscript prime Baseline 3rd Row bold upper Y Subscript i Superscript asterisk Baseline equals left-bracket ModifyingAbove upper Y Subscript i Baseline 1 Baseline circled-times e Subscript upper C minus 1 Baseline comma ellipsis comma upper Y Subscript i Baseline 1 Baseline circled-times e Subscript upper C minus 1 Baseline With top-brace Overscript n Subscript i Baseline minus 1 Endscripts comma ModifyingBelow upper Y Subscript i Baseline 2 Baseline circled-times e Subscript upper C minus 1 Baseline comma ellipsis comma upper Y Subscript i Baseline 2 Baseline circled-times e Subscript upper C minus 1 Baseline With bottom-brace Underscript n Subscript i Baseline minus 2 Endscripts comma ellipsis comma ModifyingAbove upper Y Subscript i n Sub Subscript i Subscript minus 1 Baseline circled-times e Subscript upper C minus 1 Baseline With top-brace Overscript 1 Endscripts right-bracket prime EndLayout

where denotes the Kronecker product and denotes a vector of dimension l composed of ones. The difference represents the residuals of the model for the conditional expectation.

For both binary and multinomial data, the ALR estimates for and are the simultaneous solutions to the estimating equations

StartLayout 1st Row bold upper S 1 left-parenthesis bold-italic beta comma bold-italic alpha right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts StartFraction partial-differential bold-italic mu Subscript i Baseline Over partial-differential bold-italic beta EndFraction prime bold upper V Subscript i Baseline 11 Superscript negative 1 Baseline left-parenthesis bold upper Y Subscript i Baseline minus bold-italic mu Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis equals bold 0 2nd Row bold upper S 2 left-parenthesis bold-italic beta comma bold-italic alpha right-parenthesis equals sigma-summation Underscript i equals 1 Overscript upper K Endscripts StartFraction partial-differential bold-italic xi Subscript i Baseline Over partial-differential bold-italic alpha EndFraction prime bold upper V Subscript i Baseline 33 Superscript negative 1 Baseline left-parenthesis bold upper Y Subscript i Superscript asterisk Baseline minus bold-italic xi Subscript i Baseline right-parenthesis equals bold 0 EndLayout

where and . The fitting algorithm alternates between a GEE step to update the model for the mean and a logistic regression step to update the log odds ratio model. Upon convergence, the ALR algorithm provides estimates of the regression parameters for the mean, ; the regression parameters for the log odds ratios, ; their standard errors; and their covariances.

Specifying Log Odds Ratio Models

Specifying a regression model for the log odds ratio requires you to specify the rows of the matrix . For binary data, there is a row for each cluster i and within-cluster pair . For ordinal multinomial data, there is a row for each cluster i, within-cluster pair , and choice of levels .

For ordinal multinomial data, the GEE procedure supports only the ALR method that uses a fully exchangeable regression structure for the log odds ratio. In a fully exchangeable model, the log odds ratio is constant for all clusters i, within-cluster pair , and levels . You select a fully exchangeable model for the log odds ratio by specifying LOGOR=EXCH.

For binary data, the GEE procedure provides several methods of specifying . You apply these methods by specifying LOGOR=keyword and associated options in the REPEATED statement. The supported keywords and the resulting log odds ratio models are described as follows:

EXCH

specifies exchangeable log odds ratios. In this model, the log odds ratio is a constant for all clusters i and pairs . The parameter is the common log odds ratio.

bold z Subscript i j k Baseline equals 1 for all i comma j comma k

FULLCLUST

specifies fully parameterized clusters. Each cluster is parameterized in the same way, and there is a parameter for each unique pair within clusters. If a complete cluster is of size n, then there are parameters in the vector . For example, if a full cluster is of size 4, then there are parameters, and the matrix is of the form

bold upper Z equals Start 6 By 6 Matrix 1st Row 1st Column 1 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 1 5th Column 0 6th Column 0 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 1 6th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 1 EndMatrix

The elements of correspond to log odds ratios for cluster pairs in the following order:

Pair	Parameter
(1,2)	Alpha1
(1,3)	Alpha2
(1,4)	Alpha3
(2.3)	Alpha4
(2,4)	Alpha5
(3,4)	Alpha6

LOGORVAR(variable)

specifies log odds ratios by cluster. The argument variable is a variable name that defines the "block effects" between clusters. The log odds ratios are constant within clusters, but they take a different value for each different value of the variable. For example, if Center is a variable in the input data set that takes a different value for k treatment centers, then when you specify LOGOR=LOGORVAR(Center), you get a model that has different log odds ratios for each of the k centers, constant within center.

NESTK

specifies k-nested log odds ratios. You must also specify the SUBCLUST=variable option to define subclusters within clusters. Within each cluster, PROC GEE computes a log odds ratio parameter for pairs that have the same value of variable for both members of the pair and one log odds ratio parameter for each unique combination of different values of variable.

NEST1

specifies 1-nested log odds ratios. You must also specify the SUBCLUST=variable option to define subclusters within clusters. There are two log odds ratio parameters for this model. Pairs that have the same value of variable correspond to one parameter; pairs that have different values of variable correspond to the other parameter. For example, if patients are clustered by hospital and subclusters are the wards within those hospitals, then the outcomes of patients within the same ward have one log odds ratio parameter, and the outcomes of patients from different wards have the other parameter.

ZFULL

specifies the full matrix. You must also specify a SAS data set that contains the matrix by using the ZDATA=data-set-name option. Each observation in the data set corresponds to one row of the matrix. You must specify the ZDATA data set as if all clusters are complete—that is, as if all clusters are the same size and there are no missing observations. The ZDATA data set has observations, where K is the number of clusters and is the maximum cluster size. If the members of cluster i are ordered as , then the rows of the matrix must be specified for pairs in the order . The variables that you specify in the REPEATED statement for the SUBJECT effect must also be present in the ZDATA= data set to identify clusters. You must specify variables in the data set that define the columns of the matrix by using the ZROW=variable-list option. If there are q columns (q variables in variable-list), then there are q log odds ratio parameters. You can optionally specify variables that indicate the cluster pairs corresponding to each row of the matrix by using the YPAIR=(variable1, variable2) option. If you specify this option, the data from the ZDATA data set are sorted within each cluster by variable1 and variable2. See Example 50.4 for an example of specifying a full matrix.

ZREP

specifies a replicated matrix. You specify matrix data exactly as you do for the ZFULL option case, except that you specify only one complete cluster. The matrix for the one cluster is replicated for each cluster. The number of observations in the ZDATA data set is , where is the size of a complete cluster (a cluster with no missing observations).

ZREP(matrix)

specifies direct input of the replicated matrix. You specify the matrix for one cluster by using the syntax LOGOR=ZREP ( ), where and are numbers that represent a pair of observations from the ith cluster and the values make up the corresponding row of the matrix. The number of specified rows is , where is the size of a complete cluster (a cluster with no missing observations). For example,

logor =  zrep((1 2) 1 0,
              (1 3) 1 0,
              (1 4) 1 0,
              (2 3) 1 1,
              (2 4) 1 1,
              (3 4) 1 1)

specifies the rows of the matrix for a cluster of size 4 with q = 2 log odds ratio parameters. The log odds ratio for the pairs (1 2), (1 3), (1 4) is , and the log odds ratio for the pairs (2 3), (2 4), (3 4) is .

Last updated: December 09, 2022