The MODEL statement is required for specifying the outcome model. You provide the outcome (the name of the outcome variable) to the left of the equal sign and effects (the treatment and mediator effects) to the right. If your outcome variable is a time-to-event response, you can specify the right-censoring variable in censor. The values in the list for the censoring variable indicate right-censored responses. Optionally, you can specify outcome attributes in outcome-options and modeling attributes in model-options.
Together, the COVAR, MEDIATOR, and MODEL statements specify the relationships of all variables in the mediation analysis. The treatment and mediator variables that you specify in the MODEL statement must be consistent with those that are you specify in the MEDIATOR statement. If there are covariates in the analysis, do not specify them or their effects in the MODEL statement even though the covariate effects on the outcome variable are being modeled. Instead, use the COVAR statement to specify the covariate effects.
Outcome variables can be binary, continuous, count, or time-to-event (or failure/survival time) variables. PROC CAUSALMED does not support outcome variables that are nominal or ordinal and have more than two levels.
Suppose that the outcome variable is Y, the treatment variable is T, and the mediator variable is M. The three possibilities for the syntax of effects are as follows:
model Y = T M;
model Y = T M T*M;
model Y = T | M;
The first statement specifies the effects of T and M but no interaction effect between the two. The second and third statements are equivalent. Both specify the effects of T, M, and their interaction. The order of T and M is not important.
If outcome variable Y is a count response, you can specify either a Poisson (DIST=POISSON) or negative binomial (DIST=NB) distribution in the model-options. For example:
model Y = T | M / dist=poisson;
If outcome variable Y is a time-to-event response, you can specify either an accelerated failure time model (Kalbfleisch and Prentice 1980) by using the AFT option or a Cox proportional hazards model (Cox 1972) by using the COXPH option. For example:
model Y = T | M / AFT;
If outcome variable Y is a time-to-event response and Q is a variable that indicates right-censoring of the response by values 1 or 2, you can use the following statement to specify such a right-censored outcome response:
model Y*Q(1 2) = T | M / AFT;
If outcome variable Y is a binary variable, you can list it in the CLASS statement. Alternatively, you can specify DIST=BIN in model-options. For example, specifying
class Y;
model Y = T | M;
is equivalent to specifying
model Y = T | M / dist=bin;
Both specifications model Y as a binary response with a Bernoulli distribution (that is, a binomial distribution with a single trial).
For binary outcomes, it is important to indicate which level or category represents the events being modeled. You can use the following outcome-options to specify the attributes of the binary levels:
-
DESCENDING
DESC
reverses the sort order of a binary outcome variable. If both the DESCENDING and ORDER= options are specified, PROC CAUSALMED orders the outcome categories according to the ORDER= option and then reverses that order.
-
EVENT='level' | FIRST | LAST
-
specifies the event category or level for the binary outcome. PROC CAUSALMED models the probability of the event of the outcome. The category that is not specified in the EVENT= option is then automatically treated as the reference level (see the REF= option). You cannot specify both the EVENT= and REF= options for the outcome variable.
You can specify one of the following keywords for the EVENT= option:
- 'level'
specifies the level in quotation marks to be used as the event. Specify the formatted value of the variable if a format is assigned.
- FIRST
designates the first ordered level as the event.
- LAST
designates the last ordered level as the event.
By default, EVENT=FIRST.
One of the most common sets of outcome levels is {'No','Yes'}, where 'Yes' represents the event whose probability is modeled. To specify this event level for the outcome variable Y, use the following MODEL statement:
model Y(event='Yes') = T | M;
-
ORDER=DATA | FORMATTED | FREQ | INTERNAL
-
specifies the sort order for the levels of the outcome variable. The following table displays the available ORDER= options.
| ORDER= |
Levels Sorted By |
|
DATA |
Order of appearance in the input data set. |
|
FORMATTED |
External formatted value, except for numeric variables that have no explicit format, which are sorted by their unformatted (internal) value. The sort order is machine-dependent. |
|
FREQ |
Descending frequency count. Levels that have the most observations come first in the order. |
|
INTERNAL |
Unformatted value. The sort order is machine-dependent. |
By default, ORDER=FORMATTED.
For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts.
-
REFERENCE='level' | FIRST | LAST
REF='level' | FIRST | LAST
-
specifies the reference level for the binary outcome variable.
You can specify one of the following keywords for the REF= option:
- 'level'
specifies the level in quotation marks to be used as the reference level. Specify the formatted value of the variable if a format is assigned.
- FIRST
designates the first ordered level as the reference level.
- LAST
designates the last ordered level as the reference level.
By default, REF=LAST.
PROC CAUSALMED supports two main classes of outcome models: generalized linear models and failure time (survival) models. You can specify the related modeling attributes by providing the following model-options after the slash (/):
-
AFT
fits an accelerated failure time model to time-to-event outcomes.
-
COXPH
fits a Cox proportional hazards model to time-to-event outcomes.
-
DIST=keyword
DISTRIBUTION=keyword
-
specifies the built-in probability distribution
to use in the model.
For generalized linear models, the valid keywords for this option are described in Table 3. If you specify these distributions and you omit the LINK= option, a default link function is chosen as displayed in Table 3.
Table 3: Distributions and Default Link Functions
| DIST= |
Distribution |
Default Link Function |
|
BIN | B |
Binary |
Logit |
|
NEGBIN | NB |
Negative binomial |
Log |
|
NORMAL | NOR | N |
Normal |
Identity |
|
POISSON | POI | P |
Poisson |
Log |
If you specify neither the DIST= option nor the LINK= option for a generalized linear model, then the CAUSALMED procedure defaults to the binary distribution with logit link if the outcome variable is listed in the CLASS statement. If the outcome variable is not listed in the CLASS statement, then the CAUSALMED procedure defaults to the normal distribution with the identity link function.
For the Poisson and negative binomial distributions, responses must be nonnegative, but they can take noninteger values. Observations whose response values are outside of the distribution’s support are not used to estimate the mediation effects.
If you specify an accelerated failure time model by using the AFT option in the model-options, you can specify the following keywords for the distribution of the time-to-event response:
- EXP | EXPONENTIAL
specifies an exponential distribution, which is treated as a restricted Weibull distribution.
- GAMMA
specifies a generalized gamma distribution (Lawless 2003, p. 240). The standard two-parameter gamma distribution is not available in PROC CAUSALMED.
- LOGISTIC
specifies a logistic distribution.
- NORMAL
specifies a normal distribution.
- WEIBULL
specifies a Weibull distribution. If the NOLOG option is also specified, PROC CAUSALMED fits a type 1 extreme-value distribution to the raw, untransformed data.
By default, DIST=WEIBULL for accelerated failure time models.
-
LINK=keyword
-
specifies the link function
in the model. You can specify the keywords shown in Table 4.
By default, the link function is chosen as shown in Table 3. This option does not apply to accelerated failure time and Cox proportional hazards models.
-
NOLOG
-
requests that no log transformation
of the time-to-event outcome response be performed to fit the accelerated failure time model. This option is irrelevant if you do not specify the accelerated failure time model with the AFT option.
By default, PROC CAUSALMED transforms the time-to-event outcome response with the natural logarithm before fitting the accelerated failure time model if you specify DIST=EXP, GAMMA, or WEIBULL. The NOLOG option thus suppresses the log transformation for these three distributions. For DIST=NORMAL or LOGISTIC, log transformation is not applied whether you specify the NOLOG option or not.
If log transformation is applied to an accelerated failure time model, the causal mediation effects are expressed as ratios of the relevant expected event (failure) times. If log transformation is not applied to an accelerated failure time model, the causal mediation effects are expressed as differences between the relevant expected event (failure) times.