The GENMOD Procedure

Example 51.5 GEE for Binary Data with Logit Link Function

(View the complete code for this example.)

Output 51.5.1 displays a partial listing of a SAS data set of clinical trial data comparing two treatments for a respiratory disorder. See "Gee Model for Binary Data" in the SAS/STAT Sample Program Library for the complete data set. These data are from Stokes, Davis, and Koch (2000).

Patients in each of two centers are randomly assigned to groups receiving the active treatment or a placebo. During treatment, respiratory status, represented by the variable outcome (coded here as 0=poor, 1=good), is determined for each of four visits. The variables center, treatment, sex, and baseline (baseline respiratory status) are classification variables with two levels. The variable age (age at time of entry into the study) is a continuous variable.

Explanatory variables in the model are Intercept (x Subscript i j Baseline 1), treatment (x Subscript i j Baseline 2), center (x Subscript i j Baseline 3), sex (x Subscript i j Baseline 4), age (x Subscript i j Baseline 5), and baseline (x Subscript i j Baseline 6), so that x prime equals left-bracket x Subscript i j Baseline 1 Baseline comma x Subscript i j Baseline 2 Baseline comma ellipsis comma x Subscript i j Baseline 6 Baseline right-bracket is the vector of explanatory variables. Indicator variables for the classification explanatory variables can be automatically generated by listing them in the CLASS statement in PROC GENMOD. To be consistent with the analysis in Stokes, Davis, and Koch (2000), the four classification explanatory variables are coded as follows via options in the CLASS statement:

StartLayout 1st Row 1st Column x Subscript i j Baseline 2 2nd Column equals 3rd Column StartLayout Enlarged left-brace 1st Row  0 placebo 2nd Row  1 active EndLayout 4th Column x Subscript i j Baseline 3 5th Column equals 6th Column StartLayout Enlarged left-brace 1st Row  0 center 1 2nd Row  1 center 2 EndLayout EndLayout
StartLayout 1st Row 1st Column x Subscript i j Baseline 4 2nd Column equals 3rd Column StartLayout Enlarged left-brace 1st Row  0 male 2nd Row  1 female EndLayout 4th Column x Subscript i j Baseline 6 5th Column equals 6th Column StartLayout Enlarged left-brace 1st Row  0 0 2nd Row  1 1 EndLayout EndLayout

Suppose y Subscript i j represents the respiratory status of patient i at the jth visit, j equals 1 comma ellipsis comma 4, and mu Subscript i j Baseline equals normal upper E left-parenthesis y Subscript i j Baseline right-parenthesis represents the mean of the respiratory status. Since the response data are binary, you can use the variance function for the binomial distribution v left-parenthesis mu Subscript i j Baseline right-parenthesis equals mu Subscript i j Baseline left-parenthesis 1 minus mu Subscript i j Baseline right-parenthesis and the logit link function g left-parenthesis mu Subscript i j Baseline right-parenthesis equals log left-parenthesis mu Subscript i j Baseline slash left-parenthesis 1 minus mu Subscript i j Baseline right-parenthesis right-parenthesis. The model for the mean is g left-parenthesis mu Subscript i j Baseline right-parenthesis equals bold x prime Subscript i j Baseline bold-italic beta, where bold-italic beta is a vector of regression parameters to be estimated.

Output 51.5.1: Respiratory Disorder Data

Obs center id treatment sex age baseline visit1 visit2 visit3 visit4 visit outcome
1 1 1 P M 46 0 0 0 0 0 1 0
2 1 1 P M 46 0 0 0 0 0 2 0
3 1 1 P M 46 0 0 0 0 0 3 0
4 1 1 P M 46 0 0 0 0 0 4 0
5 1 2 P M 28 0 0 0 0 0 1 0
6 1 2 P M 28 0 0 0 0 0 2 0
7 1 2 P M 28 0 0 0 0 0 3 0
8 1 2 P M 28 0 0 0 0 0 4 0
.                        
.                        
.                        
214 2 1 P F 39 0 0 0 0 0 1 0
215 2 1 P F 39 0 0 0 0 0 2 0
216 2 1 P F 39 0 0 0 0 0 3 0
217 2 1 P F 39 0 0 0 0 0 4 0
218 2 2 A M 25 0 0 1 1 1 1 0
219 2 2 A M 25 0 0 1 1 1 2 1
220 2 2 A M 25 0 0 1 1 1 3 1
221 2 2 A M 25 0 0 1 1 1 4 1
.                        
.                        
.                        
.                        


The GEE solution is requested with the REPEATED statement in the GENMOD procedure. The option SUBJECT=ID(CENTER) specifies that the observations in any single cluster are uniquely identified by both center and id. An equivalent specification is SUBJECT=ID*CENTER. Since the same id values are used in each center, one of these specifications is needed. If id values were unique across all centers, SUBJECT=ID would be specified.

The option TYPE=UNSTR specifies the unstructured working correlation structure. The MODEL statement specifies the regression model for the mean with the binomial distribution variance function. The following SAS statements perform the GEE model fit:

proc genmod data=resp;
   class id treatment(ref="P") center(ref="1") sex(ref="M")
      baseline(ref="0");
   model outcome(event='1')=treatment center sex age baseline / dist=bin;
   repeated subject=id(center) / corr=unstr corrw;
run;

These statements first fit the generalized linear (GLM) model specified in the MODEL statement. The parameter estimates from the generalized linear model fit are not shown in the output, but they are used as initial values for the GEE solution. The EVENT='1' option in the MODEL statement models the probability that outcome = 1. If the EVENT='1' option had not been specified, the probability that outcome = 0 would be modeled by default.

Information about the GEE model is displayed in Output 51.5.2. The results of GEE model fitting are displayed in Output 51.5.3. Model goodness-of-fit criteria are displayed in Output 51.5.4. If you specify no other options, the standard errors, confidence intervals, Z scores, and p-values are based on empirical standard error estimates. You can specify the MODELSE option in the REPEATED statement to create a table based on model-based standard error estimates.

Output 51.5.2: Model Fitting Information

The GENMOD Procedure

GEE Model Information
Correlation Structure Unstructured
Subject Effect id(center) (111 levels)
Number of Clusters 111
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 4


Output 51.5.3: Results of Model Fitting

Working Correlation Matrix
  Col1 Col2 Col3 Col4
Row1 1.0000 0.3351 0.2140 0.2953
Row2 0.3351 1.0000 0.4429 0.3581
Row3 0.2140 0.4429 1.0000 0.3964
Row4 0.2953 0.3581 0.3964 1.0000

Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter   Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept   -0.8882 0.4568 -1.7835 0.0071 -1.94 0.0519
treatment A 1.2442 0.3455 0.5669 1.9214 3.60 0.0003
treatment P 0.0000 0.0000 0.0000 0.0000 . .
center 2 0.6558 0.3512 -0.0326 1.3442 1.87 0.0619
center 1 0.0000 0.0000 0.0000 0.0000 . .
sex F 0.1128 0.4408 -0.7512 0.9768 0.26 0.7981
sex M 0.0000 0.0000 0.0000 0.0000 . .
age   -0.0175 0.0129 -0.0427 0.0077 -1.36 0.1728
baseline 1 1.8981 0.3441 1.2237 2.5725 5.52 <.0001
baseline 0 0.0000 0.0000 0.0000 0.0000 . .


Output 51.5.4: Model Fit Criteria

GEE Fit Criteria
QIC 512.3416
QICu 499.6081


The nonsignificance of age and sex make them candidates for omission from the model.

Last updated: December 09, 2022