The CAUSALMED Procedure

Causal Mediation Effects: Assumptions, Identification, and Estimation

This section continues the discussion of the theoretical foundations of the CAUSALMED procedure. The section Causal Mediation Effects: Theory, Definitions, and Effect Decompositions defines the causal mediation effects and various total effect decompositions that the procedure estimates. This section describes the assumptions that underlie the identification and estimation of causal mediation effects. It also discusses the implications of these assumptions for valid applications of the procedure.

Identification of Causal Mediation Effects

This section lays out the identification conditions of causal mediation effects and their implications for applying statistical methods that aim to obtain unbiased estimation of the effects.

First, it is useful to distinguish the following three types of confounding covariates:

  • upper C 1 represents a generic covariate that confounds the relationship between T and Y. This is a treatment-outcome confounder.

  • upper C 2 represents a generic covariate that confounds the relationship between M and Y. This is a mediator-outcome confounder.

  • upper C 3 represents a generic covariate that confounds the relationship between T and M. This is a treatment-mediator confounder.

As in preceding sections, let C denote all of the covariates upper C 1, upper C 2, and upper C 3. Thus, controlling for C in regression analysis means that all types of confounding covariates are being controlled for.

According to Valeri and VanderWeele (2013), the following four assumptions are required for the identification of causal mediation effects:

  • no unmeasured treatment-outcome confounders given C

  • no unmeasured mediator-outcome confounders given (C, T)

  • no unmeasured treatment-mediator confounders given C

  • no mediator-outcome confounder is affected by T (directly or indirectly) given C

The identification of the controlled direct effect (CDE) assumes the first two conditions, and the identification of the natural direct effect (NDE) and the natural indirect effect (NIE) assumes all four conditions. These four assumptions are collectively called the "no unmeasured confounding assumption." Formal statements for these identification conditions can be found in the appendix of Valeri and VanderWeele (2013) and VanderWeele (2015).

Essentially, in order to obtain unbiased estimation of causal mediation and related effects, the regression adjustment method that is discussed in the next section assumes that the identification conditions are satisfied.

In practice, the implication is that in order to have valid causal interpretations of the mediation effects, you must be able to measure all relevant confounding covariates C and include them in a causal mediation analysis. For example, the first identification condition states that there are no unmeasured treatment-outcome confounders given C. Practically, a simple interpretation of this condition is that if there are treatment-outcome confounders upper C 1 in the observational study, your set of C must have measured and included these confounders in the analysis in order to obtain unbiased estimation of causal effects. Similarly, other identification conditions require upper C 2 or upper C 3, if present, to be measured and included in the analysis.

Regression Methods for Causal Mediation Analysis

The CAUSALMED procedure implements regression methods for estimating causal mediation effects that assume the identification conditions of the preceding section along with correct specification of the following two models:

  • the outcome model for Y given T, M, and C

  • the mediator model for M given T and C

For a class of generalized linear models, VanderWeele and Vansteelandt (2009), VanderWeele and Vansteelandt (2010), VanderWeele (2011), and Valeri and VanderWeele (2013) derived analytic formulas for computing various causal mediation effects for different variable types, including combinations of the following cases:

  • outcome variable Y, which can be binary, continuous (including time-to-event outcome), or count

  • treatment variable T, which can be binary or continuous

  • mediator variable M, which can be binary or continuous

  • covariates C, which can be categorical or continuous

PROC CAUSALMED implements these analytic formulas.

In addition to the causal effect identification assumptions that are described in the section Identification of Causal Mediation Effects, the validity of the formulas assumes that the outcomes are rare events (VanderWeele 2011, 2014) in the following two cases:

  • when the outcome Y is binary

  • when the outcome Y is a time-to-event variable and is modeled by the proportional hazards model

For the first case, if Y is not rare, then the formulas are still valid if Y is modeled by using the log link.

Let theta represent the vector that collects all parameters in the outcome and mediator models. Under the correct specification of regression models and the identification assumptions, the causal effects in a mediation analysis are functions of theta conditional on the covariate values. That is, a causal effect, which is denoted by ef, can be expressed as a function of theta given C = c,

g Subscript e f Baseline left-parenthesis theta vertical-bar upper C equals c right-parenthesis comma

where c represents some fixed values for covariates C. By default, the causal estimands that PROC CAUSALMED targets are of the form g Subscript e f Baseline left-parenthesis theta vertical-bar mu Subscript c Baseline right-parenthesis, where mu Subscript c is the expected value of C. Therefore, in general g Subscript e f Baseline left-parenthesis theta vertical-bar mu Subscript c Baseline right-parenthesis is interpreted as a conditional effect or as an "overall" effect, but not as a marginal effect. It would also be a marginal effect only when g Subscript e f Baseline left-parenthesis theta vertical-bar upper C right-parenthesis is linear in C.

The default estimand, g Subscript e f Baseline left-parenthesis theta vertical-bar mu Subscript c Baseline right-parenthesis, is consistent with the treatment of the SAS macros that are implemented by Valeri and VanderWeele (2013) and Valeri and VanderWeele (2015). For more information about the definitions of various causal mediation effects, see the sections Estimands on Difference Scale and Estimands on Ratio Scale.

For categorical covariates, mu Subscript c is defined by the expectations of the dummy-coded 0 or 1 value for categorical levels. However, this does not mean that PROC CAUSALMED requires you to dummy-code the categorical covariates for analysis. The dummy coding is done internally in the procedure.

In addition to g Subscript e f Baseline left-parenthesis theta vertical-bar mu Subscript c Baseline right-parenthesis, PROC CAUSALMED also supports the estimation of g Subscript e f Baseline left-parenthesis theta vertical-bar upper C equals c 0 right-parenthesis, where c 0 represents specific fixed covariate levels that you want to use for evaluating the causal mediation effects—for example, the causal mediation effects for a particular interest group that is defined by some fixed values of c 0. For more information about specifying these covariate values and evaluating conditional causal mediation effects, see the EVALUATE statement and the section Evaluating Causal Mediation Effects.

Maximum Likelihood Estimation

For random samples, PROC CAUSALMED estimates causal mediation effects by the maximum likelihood method. The maximum likelihood estimate ModifyingAbove theta With caret of theta is first estimated for the outcome and mediator models. Then the maximum likelihood estimates of various causal mediation effects are computed as

g Subscript e f Baseline left-parenthesis ModifyingAbove theta With caret vertical-bar upper C equals c 0 right-parenthesis comma

where ef is the index for effects and C = c 0 is the average of the covariate values that are computed from the sample. For categorical covariates, this definition of c 0 assumes that the levels are dummy-coded as 0 and 1, which is done internally by PROC CAUSALMED. Therefore, g Subscript e f Baseline left-parenthesis ModifyingAbove theta With caret vertical-bar upper C equals c 0 right-parenthesis estimates g Subscript e f Baseline left-parenthesis theta vertical-bar mu Subscript c Baseline right-parenthesis in the default output of PROC CAUSALMED.

Given the estimated covariance matrix for ModifyingAbove theta With caret, the delta method is used to estimate the standard errors for the causal effects g Subscript e f Baseline left-parenthesis ModifyingAbove theta With caret vertical-bar upper C equals c 0 right-parenthesis. In the computation of these estimates, the covariate values c 0 are treated as fixed values. For more information about the delta method for computing standard errors in this context, see VanderWeele and Vansteelandt (2009) or VanderWeele (2015).

Alternatively, you can use bootstrap techniques to compute standard error estimates and confidence intervals. For more information about the bootstrap method, see the BOOTSTRAP statement and the section Bootstrap Methods.

As explained in the preceding sections, the evaluation of causal mediation effects depends on the levels of covariates. In addition to the overall causal mediation effects that are evaluated at C = c 0, you can provide particular covariate levels, say C = c 1, that are particularly meaningful to your research by using the EVALUATE statement. The maximum likelihood estimate is then g Subscript e f Baseline left-parenthesis ModifyingAbove theta With caret vertical-bar upper C equals c 1 right-parenthesis and the standard error is computed similarly by the delta method. For more information about evaluating causal mediation effects, see the section Evaluating Causal Mediation Effects. For an illustration, see Example 38.2.

Estimation of Various Total Effect Decompositions

Formulas for estimating the components of the four-way decomposition of the total effect (VanderWeele 2014) follow essentially the same logic that is described in the preceding sections.

For time-to-event outcomes, the components of the four-way decomposition are computed on the mean time ratio scale when the outcome is modeled by the accelerated failure time model, and they are computed on the hazard ratio scale when the outcome is modeled by the Cox proportional hazards model.

For other continuous outcomes, the components of the four-way decomposition are computed on the original scales. For binary outcomes, the components of the four-way decomposition are computed on the odds ratio scale and the excess relative risk scale. These formulas are quite involved and are not presented here. For more information, see VanderWeele (2014).

In addition to the four-way decomposition, PROC CAUSALMED estimates the component effects of several other two-way and three-way decompositions by using the same analytic technique as that of the four-way decomposition. You can use the DECOMP= option in the EVALUATE or PROC CAUSALMED statement to request these decompositions.

To compute standard error estimates for these component effects and their percentage contribution, PROC CAUSALMED uses the delta method with analytic derivatives. Bootstrap methods are also available for computing standard errors and confidence intervals. For more information about bootstrap estimation, see the BOOTSTRAP statement and the section Bootstrap Methods.

Last updated: December 09, 2022