The GEE Procedure

REPEATED Statement

  • REPEATED SUBJECT=subject-effect </ options>;

The REPEATED statement specifies the correlation structure of the responses for GEE model fitting. In addition, the REPEATED statement controls the iterative fitting algorithm and specifies optional output.

Table 10 summarizes the options available in the REPEATED statement.

Table 10: REPEATED Statement Options

Option Description
ALPHAINIT= Specifies initial values for log odds ratio regression parameters
CONVERGE= Specifies the convergence criterion for GEE parameter estimation
CORRB Displays the estimated correlation matrix
CORRW Displays the estimated working correlation matrix
COVB Displays the estimated covariance matrix
ECORRB Displays the estimated empirical correlation matrix
ECOVB Displays the estimated empirical covariance matrix
INITIAL= Specifies initial values of the regression parameters estimation
INTERCEPT= Specifies an initial value of the intercept
LOGOR= Specifies the use of alternating logistic regression and a model for the log odds ratio
MAXITER= Specifies the maximum number of iterations
MCORRB Displays the estimated model-based correlation matrix
MCOVB Displays the estimated model-based covariance matrix
MODELSE Displays a parameter estimates table with the model-based standard errors
SUBCLUSTER= Specifies a variable that defines subclusters
SUBJECT= Identifies a different subject (cluster)
TYPE= Specifies the working correlation matrix structure
WITHIN= Specifies the order of measurements within subjects
ZDATA= Specifies the full bold z matrix
ZROW= Specifies the rows of the bold z matrix


You must specify the SUBJECT= option:

SUBJECT=subject-effect

identifies subjects in the input data set. The subject-effect can be a single variable, an interaction effect, a nested effect, or a combination. Each distinct value (level) of the effect identifies a different subject (cluster). Responses from different subjects are assumed to be statistically independent, and responses within subjects are assumed to be correlated. You must specify a subject-effect, and you must list variables that are used in defining the subject-effect in the CLASS statement.

You can also specify the following options after a slash (/) to control how the model is fit and what output is produced:

ALPHAINIT=numbers

specifies initial values for log odds ratio regression parameters if you specify the option LOGOR= for data that have either binary or ordinal multinomial responses. The default value of numbers is 0.01.

CONVERGE=number

specifies the convergence criterion for GEE parameter estimation. If the maximum absolute difference between regression parameter estimates is less than number on two successive iterations, convergence is declared. If the absolute value of a regression parameter estimate is greater than 0.08, then the absolute difference normalized by the regression parameter value is used instead of the absolute difference. The default value of number is 0.0001.

CORRB

displays the estimated regression parameter correlation matrix. Both model-based and empirical correlations are displayed.

CORRW

displays the estimated working correlation matrix. If you specify TYPE=EXCH for the exchangeable working correlation structure, then the CORRW option is not needed to view the estimated correlation, because a table that contains the single estimated correlation is printed by default.

COVB

displays the estimated regression parameter covariance matrix. Both model-based and empirical covariances are displayed.

ECORRB

displays the estimated regression parameter empirical correlation matrix.

ECOVB

displays the estimated regression parameter empirical covariance matrix.

INITIAL=numbers

specifies initial values of the regression parameters estimation, other than the intercept parameter, for GEE estimation. If you do not specify this option, then the estimated regression parameters (assuming independence for all responses) are used for the initial values.

INTERCEPT=number

specifies an initial value of the intercept regression parameter in the GEE model.

LOGOR=log-odds-ratio-structure-keyword

specifies the use of the alternating logistic regression (ALR) method and the regression model structure for the log odds ratio. For data that have either a binary or ordinal multinomial response distribution, the ALR method uses the log odds ratio to model the association of the responses from subjects. For more information about the ALR method and examples of specifying log odds ratio models, see the section Alternating Logistic Regression. You can specify the values that are shown in Table 11.

Table 11: Log Odds Ratio Regression Structures

Keyword Log Odds Ratio Regression Structure
EXCH Exchangeable
FULLCLUST Fully parameterized clusters
LOGORVAR(variable) Indicator variable for specifying block effects
NESTK k-nested
NEST1 1-nested
ZFULL Fully specified bold z matrix specified in ZDATA= data set
ZREP Single cluster specification for replicated bold z matrix specified
in ZDATA= data set
ZREP(matrix) Single cluster specification for replicated bold z matrix


For ordinal multinomial data, only the exchangeable regression structure that is specified by LOGOR=EXCH is supported. You should specify the option LOGOR= or TYPE=, but not both.

MAXITER=number
MAXIT=number

specifies the maximum number of iterations allowed in the iterative GEE estimation process. By default, MAXITER=50.

MCORRB

displays the estimated regression parameter model-based correlation matrix.

MCOVB

displays the estimated regression parameter model-based covariance matrix.

MODELSE

displays a parameter estimates table that uses model-based standard errors for inference. By default, a "Parameter Estimates" table that is based on empirical standard errors is displayed.

SUBCLUSTER=variable
SUBCLUST=variable

specifies a variable that defines subclusters for the 1-nested or k-nested log odds ratio association modeling structures for data that have a binary response distribution. A 1-nested or k-nested modeling structure is specified in the option LOGOR=, and variable must be listed in the CLASS statement. For definitions of the 1-nested and k-nested modeling structures, see the section Specifying Log Odds Ratio Models.

TYPE=correlation-structure-keyword
CORR=correlation-structure-keyword

specifies the structure of the working correlation matrix that is used to model the correlation of the responses from subjects for ordinary GEEs. You can specify the values that are shown in Table 12 (for definitions of the correlation matrix types, see Table 13 in the section Details: GEE Procedure).

Table 12: Correlation Structure Types

Keyword Correlation Structure Type
AR | AR(1) Autoregressive(1)
EXCH | CS Exchangeable
IND Independent
MDEP(number) m-dependent, where m = number
UNSTR | UN Unstructured
USER(matrix) | FIXED(matrix) Fixed, user-specified correlation matrix


For example, the following option specifies a fixed 4 times 4 correlation matrix:

type=user( 1.0  0.9  0.8  0.6
           0.9  1.0  0.9  0.8
           0.8  0.9  1.0  0.9
           0.6  0.8  0.9  1.0 )

By default, TYPE=IND. When you specify the alternating logistic regression method by using the LOGOR= option, you should not specify the TYPE= option.

WITHINSUBJECT=within-subject-effect
WITHIN=within-subject-effect

defines an effect that specifies the order of measurements within subjects. Each distinct level of the within-subject-effect defines a different response from the same subject. If the data are in proper order within each subject, you can omit this option.

If some measurements do not appear in the data for some subjects, this option properly orders the existing measurements and treats the omitted measurements as missing values.

If you do not specify the WITHIN= option for the standard GEE method, missing values are assumed to be the last values and are not used; the remaining observations are then ordered in the sequence in which they are provided in the input data set. If you do not specify the WITHIN= option for the weighted GEE method, the observations are assumed to be ordered in the sequence in which they are provided in the input data set.

Variables that are used in defining the within-subject-effect must be listed in the CLASS statement, and the within-subject-effect level ordering depends on the options that you specify in the CLASS statement.

ZDATA=SAS-data-set

specifies a SAS data set that contains either the full bold z matrix for log odds ratio association modeling for data with binary responses or the bold z matrix for a single complete cluster to be replicated for all clusters.

ZROW=variable-list

specifies the variables in the ZDATA= data set that correspond to rows of the bold z matrix for log odds ratio association modeling for data with binary responses.

Last updated: December 09, 2022