The GEE Procedure

MODEL Statement

  • MODEL response = <effects> </ options>;

  • MODEL events/trials = <effects> </ options>;

The MODEL statement specifies the response (dependent variable) and the effects (explanatory variables). If you omit the explanatory variables, PROC GEE fits an intercept-only model. An intercept term is included in the model by default. You can remove the intercept by specifying the NOINT option.

You can specify the response in the form of a single variable (response) or in the form of a ratio of two variables ( events/trials). The first form is applicable to all responses. The second form is applicable only to summarized binomial response data. When each observation in the input data set contains the number of events (for example, successes) and the number of trials from a set of binomial trials, use the events/trials syntax.

In the events/trials model syntax, you specify two variables: one for the event counts and one for trial counts. These two variables are separated by a slash (/). The value of the events variable must be nonnegative, and the value of the trials variable must be equal to or greater than the value of the events variable for an observation to be valid. The events and trials variables can take non-integer values.

When each observation in the input data set contains a single trial from a binomial experiment, use the response form of the MODEL statement. The response variable can be numeric or character. The ordering of response levels is critical in these models.

Responses for the Poisson distribution must be all nonnegative, but they can be non-integer values.

The effects in the MODEL statement consist of an explanatory variable or combination of variables. Explanatory variables can be continuous or classification variables. Classification variables can be character or numeric. Explanatory variables that represent nominal (classification) data must be declared in a CLASS statement. Interactions between variables can also be included as effects. Columns of the design matrix are automatically generated for classification variables and interactions. The syntax for specifying effects is the same as for the GLM procedure. For more information, see the section Specification of Effects in Chapter 53, The GLM Procedure.

Table 7 summarizes the options available in the MODEL statement.

Table 7: MODEL Statement Options

Option Description
ALPHA= Sets the confidence coefficient
DIST= Specifies the probability distribution
LINK= Specifies the link function
NOINT Requests no intercept term
NOSCALE Holds the scale parameter fixed
OFFSET= Specifies a variable in the input data set to be used as an offset
SCALE= Specifies the value used for the scale
TYPE3 Computes statistics for Type 3 contrasts
WALD Requests Wald statistics for Type 3 contrasts


You can specify the following options after a slash (/).

ALPHA=number

sets the confidence coefficient for parameter confidence intervals to 1–number. The value of number must be between 0 and 1. The default value of number is 0.05.

DIST=keyword
D=keyword
ERROR=keyword
ERR=keyword

specifies the built-in probability distribution to use in the model. If you specify the DIST= option and you omit the LINK= option, a default link function is chosen as displayed in Table 8. If you specify neither the DIST= option nor the LINK= option, then the GEE procedure defaults to the normal distribution with the identity link function.

Table 8: Distributions and Default Link Functions

DIST= Distribution Default Link Function
BINOMIAL | BIN | B Binomial Logit
GAMMA | GAM | G Gamma Reciprocal
IGAUSSIAN | IG Inverse Gaussian Reciprocal square
MULTINOMIAL | MULT Multinomial Cumulative logit
NEGBIN | NB Negative binomial Log
NORMAL | NOR | N Normal Identity
POISSON | POI | P Poisson Log


LINK=keyword

specifies the link function in the model. You can specify the keywords shown in Table 9.

Table 9: Built-In Link Functions of the GEE Procedure

Link
LINK= Function g left-parenthesis mu right-parenthesis equals eta equals
CLOGLOG | CLL Complementary log-log log left-parenthesis minus log left-parenthesis 1 minus mu right-parenthesis right-parenthesis
CUMCLL | CCLL Cumulative complementary log-log log left-parenthesis minus log left-parenthesis 1 minus pi right-parenthesis right-parenthesis
CUMLOGIT| CLOGIT Cumulative logit log left-parenthesis pi slash left-parenthesis 1 minus pi right-parenthesis right-parenthesis
CUMPROBIT | CPROBIT Cumulative probit normal upper Phi Superscript negative 1 Baseline left-parenthesis pi right-parenthesis
GLOGIT Generalized logit
IDENTITY | ID Identity mu
LOG Log log left-parenthesis mu right-parenthesis
LOGIT Logit log left-parenthesis mu slash left-parenthesis 1 minus mu right-parenthesis right-parenthesis
PROBIT Probit normal upper Phi Superscript negative 1 Baseline left-parenthesis mu right-parenthesis
INVERSE | RECIPROCAL Reciprocal 1 slash mu
POWERMINUS2 Power with exponent –2 1 slash mu squared


For the probit and cumulative probit links, normal upper Phi Superscript negative 1 Baseline left-parenthesis dot right-parenthesis denotes the quantile function of the standard normal distribution. If you do not specify the LINK= option, then by default the canonical link function is used if you specify the DIST= option. Otherwise, if you omit the DIST= option, the identity link function is used.

The cumulative link functions are appropriate only for the multinomial distribution with ordinal responses, with cumulative probabilities indicated by pi. The GLOGIT link function is appropriate only for the multinomial distribution with nominal responses.

NOINT

requests that no intercept term be included in the model. An intercept is included unless this option is specified.

NOSCALE

holds the scale parameter fixed. Otherwise, for the normal, inverse Gaussian, and gamma distributions, the scale parameter is estimated by maximum likelihood. If you omit the SCALE= option, the scale parameter is fixed at the value 1.

OFFSET=variable

specifies a variable in the input data set to be used as an offset variable. This variable cannot be a CLASS variable, the response variable, or any of the explanatory variables.

SCALE=number
SCALE=PEARSON | P
PSCALE
SCALE=DEVIANCE | D
DSCALE

specifies the value used for the scale parameter when the NOSCALE option is used. For the binomial and Poisson distributions, which have no free scale parameter, this can be used to specify an overdispersed model. If the NOSCALE option is not specified, then number is used as an initial estimate of the scale parameter.

Specifying SCALE=PEARSON or SCALE=P is the same as specifying the PSCALE option. This fixes the scale parameter at the value 1 in the estimation procedure. After the parameter estimates are determined, the exponential family dispersion parameter is assumed to be given by Pearson’s chi-square statistic divided by the degrees of freedom, and all statistics such as standard errors are adjusted appropriately.

Specifying SCALE=DEVIANCE or SCALE=D is the same as specifying the DSCALE option. This fixes the scale parameter at a value of 1 in the estimation procedure.

TYPE3

requests that statistics for Type 3 contrasts be computed for each effect specified in the MODEL statement. The default analysis is to compute score statistics for the contrasts. Type 3 analyses using the score statistics are not supported for nominal response data or weighted GEE methods. Wald statistics are computed if the WALD option is also specified.

WALD

requests Wald statistics for Type 3 contrasts. You must also specify the TYPE3 option in order to compute Type 3 Wald statistics.

Last updated: December 09, 2022