The CAUSALTRT Procedure

Estimating the Average Treatment Effect (ATE)

The CAUSALTRT procedure can estimate the average treatment effect (ATE) by using inverse probability weighting, regression adjustment, and doubly robust methods (doubly robust methods are combinations of the first two methods). The following sections describe each of these estimation methods.

Inverse Probability Weighting

The CAUSALTRT procedure estimates the potential outcome means and ATE by using the three inverse probability weighting methods that are described in Lunceford and Davidian (2004). All three methods compute weights using estimates for the conditional probability of receiving treatment,

e left-parenthesis bold x right-parenthesis equals probability left-parenthesis upper T equals 1 vertical-bar bold upper X equals bold x right-parenthesis

where is a vector of the observed covariates. The conditional probability is called the propensity score, and the model that is used to estimate is called the propensity score model. The inverse probability weight for an observation is equal to

StartFraction 1 Over probability left-parenthesis upper T equals t vertical-bar bold x right-parenthesis EndFraction equals StartFraction t Over e left-parenthesis bold x right-parenthesis EndFraction plus StartFraction 1 minus t Over 1 minus e left-parenthesis bold x right-parenthesis EndFraction

In addition to the stable unit treatment value assumption (SUTVA), two more assumptions on the propensity scores are required for the use of inverse probability weighting methods. The positivity assumption, that , ensures that each observation has a nonzero probability of receiving each treatment condition. The assumption of no unmeasured confounders, also known as strong ignorability, requires that the potential outcomes and be independent of the treatment variable T conditional on the covariates . In practice, this assumption means that the set of covariates should have included all the important confounders in the propensity score model. If these conditions are satisfied and the propensity score model is correctly specified, then it follows that

upper E left-bracket StartFraction upper T upper Y Over e left-parenthesis bold x right-parenthesis EndFraction right-bracket minus upper E left-bracket StartFraction left-parenthesis 1 minus upper T right-parenthesis upper Y Over 1 minus e left-parenthesis bold x right-parenthesis EndFraction right-bracket equals upper E left-bracket StartFraction upper Y left-parenthesis 1 right-parenthesis Over e left-parenthesis bold x right-parenthesis EndFraction upper E left-bracket upper T vertical-bar bold x right-bracket right-bracket minus upper E left-bracket StartFraction upper Y left-parenthesis 0 right-parenthesis Over 1 minus e left-parenthesis bold x right-parenthesis EndFraction upper E left-bracket left-parenthesis 1 minus upper T right-parenthesis vertical-bar bold x right-bracket right-bracket equals upper E left-bracket upper Y left-parenthesis 1 right-parenthesis right-bracket minus upper E left-bracket upper Y left-parenthesis 0 right-parenthesis right-bracket

The CAUSALTRT procedure uses the effects that are specified in the PSMODEL statement to fit a logistic regression to the propensity score model. The inverse probability weights are then computed by taking the inverse of the estimated propensity scores.

Suppose you have observations , for . The parameter estimates for the propensity score model is given by , and the predicted values for the propensity score are given by

ModifyingAbove e With caret Subscript i Baseline equals ModifyingAbove e With caret left-parenthesis bold x Subscript i Baseline right-parenthesis equals StartFraction exp left-parenthesis bold x prime Subscript i Baseline ModifyingAbove bold-italic beta With caret Subscript normal p normal s Baseline right-parenthesis Over 1 plus exp left-parenthesis bold x prime Subscript i Baseline ModifyingAbove bold-italic beta With caret Subscript normal p normal s Baseline right-parenthesis EndFraction

The three weighting methods that PROC CAUSALTRT implements obtain unbiased estimates of the potential outcome means by solving the equations

bold upper S Subscript normal i normal p normal w Baseline left-parenthesis bold-italic mu right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts bold upper S Subscript normal i normal p normal w comma i Baseline equals bold 0

for , where

bold upper S Subscript normal i normal p normal w comma i Baseline equals StartBinomialOrMatrix StartFraction left-parenthesis 1 minus t Subscript i Baseline right-parenthesis left-parenthesis y Subscript i Baseline minus mu 0 right-parenthesis Over 1 minus ModifyingAbove e With caret Subscript i Baseline EndFraction minus eta 0 left-parenthesis StartFraction t Subscript i Baseline minus ModifyingAbove e With caret Subscript i Baseline Over 1 minus ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis Choose StartFraction t Subscript i Baseline left-parenthesis y Subscript i Baseline minus mu 1 right-parenthesis Over ModifyingAbove e With caret Subscript i Baseline EndFraction plus eta 1 left-parenthesis StartFraction t Subscript i Baseline minus ModifyingAbove e With caret Subscript i Baseline Over ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis EndBinomialOrMatrix

These estimation methods differ in the values of . The choices of and the corresponding potential outcome mean estimates for each method are as follows:

IPW:
IPWR:
IPWS:

where

If the values of that motivate the IPWS method were known constants, they would produce estimates for the potential outcome means that have the smallest asymptotic variance among the class of estimators that is defined by . The IPWS method estimates by solving the estimating equations

bold upper S left-parenthesis bold-italic eta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts bold upper S Subscript bold-italic eta comma i Baseline equals bold 0

for , where:

bold upper S Subscript bold-italic eta comma i Baseline equals StartBinomialOrMatrix StartFraction left-parenthesis 1 minus t Subscript i Baseline right-parenthesis left-parenthesis y Subscript i Baseline minus mu 0 right-parenthesis Over left-parenthesis 1 minus ModifyingAbove e With caret Subscript i Baseline right-parenthesis squared EndFraction plus eta 0 left-parenthesis StartFraction t Subscript i Baseline minus ModifyingAbove e With caret Subscript i Baseline Over 1 minus ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis squared Choose StartFraction t Subscript i Baseline left-parenthesis y Subscript i Baseline minus mu 1 right-parenthesis Over ModifyingAbove e With caret Subscript i Superscript 2 Baseline EndFraction plus eta 1 left-parenthesis StartFraction t Subscript i Baseline minus ModifyingAbove e With caret Subscript i Baseline Over ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis squared EndBinomialOrMatrix

Although you do not need to specify an outcome model when you use the inverse probability weighting methods, you still need to specify the outcome variable in the MODEL statement. The IPW, IPWR, or IPWS estimation method can be requested using the METHOD= option in the PROC CAUSALTRT statement. For more information about which model specifications are required for different estimation methods and how default estimation methods are determined, see the section Outline of Estimation Method Requirements.

The robust sandwich covariance estimate for is computed by stacking the equations and the score function for the logistic regression model that is used to obtain . By stacking the equations, PROC CAUSALTRT adjusts the covariance matrix for to account for the prediction of and the propensity scores . For more information about using the robust sandwich covariance estimate to compute standard errors, see the section Asymptotic Formulas.

The finite sample properties of and its covariance matrix can be affected by observations that have large weights. By default, a note is added to the SAS log if any observations have weights greater than 50. You can specify the value that is used to flag large weights in the WGTFLAG= option in the PSMODEL statement. You can also inspect the values of the predicted weights by requesting graphics in the PLOTS= option in the PROC CAUSALTRT or PSMODEL statements.

As described in Rosenbaum and Rubin (1983), the propensity score is also a balancing score. To assess the balance that is produced by a propensity score model, you can request the display of weighted and unweighted standardized mean differences and variance ratios for propensity score model effects in the COVDIFFPS option in the PROC CAUSALTRT statement. You can also request weighted and unweighted densities for propensity score model effects by specifying PSCOVDEN in the PLOTS= option in the PROC CAUSALTRT or PSMODEL statements. To inspect the balance of a propensity score model without showing the causal effects, specify the NOEFFECT option in the PROC CAUSALTRT statement.

Based on the balancing score property, you can use the propensity scores to create matched samples or strata for the data. For more information about applying the propensity score in matching and stratification see Chapter 101, The PSMATCH Procedure.

Regression Adjustment

Regression adjustment estimates the ATE by using predicted values for the potential outcomes. The CAUSALTRT procedure predicts the potential outcomes from generalized linear models that are fit to the data by maximum likelihood estimation. For more information about generalized linear models, see the section Generalized Linear Models Theory in Chapter 51, The GENMOD Procedure.

The effects that you specify in the MODEL statement are used to fit models for the outcome within each treatment condition. The predicted potential outcomes are then given by

ModifyingAbove y With caret Subscript i Baseline 0 Baseline equals g Superscript negative 1 Baseline left-parenthesis bold x prime Subscript i Baseline ModifyingAbove bold-italic beta With caret Subscript c Baseline right-parenthesis

ModifyingAbove y With caret Subscript i Baseline 1 Baseline equals g Superscript negative 1 Baseline left-parenthesis bold x prime Subscript i Baseline ModifyingAbove bold-italic beta With caret Subscript t Baseline right-parenthesis

where g is the link function being used and and are the parameter estimates for the control and treatment outcome models, respectively. The potential outcome mean is then estimated by averaging the predicted for all observations , regardless of the observed treatment assignment. Therefore, the regression adjustment estimates for the potential outcome means solve the estimating equations

bold upper S Subscript normal r normal e normal g Baseline left-parenthesis bold-italic mu right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts bold upper S Subscript normal r normal e normal g comma i Baseline equals bold 0

where

bold upper S Subscript normal r normal e normal g comma i Baseline equals StartBinomialOrMatrix ModifyingAbove y With caret Subscript i Baseline 0 Baseline minus mu 0 Choose ModifyingAbove y With caret Subscript i Baseline 1 Baseline minus mu 1 EndBinomialOrMatrix

The regression adjustment estimates for the potential outcome means are equal to

ModifyingAbove mu With caret Subscript 0 Superscript normal r normal e normal g Baseline equals n Superscript negative 1 Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts ModifyingAbove y With caret Subscript i Baseline 0

ModifyingAbove mu With caret Subscript 1 Superscript normal r normal e normal g Baseline equals n Superscript negative 1 Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts ModifyingAbove y With caret Subscript i Baseline 1

If the outcome model is correctly specified and the assumption of no unmeasured confounders is satisfied, then regression adjustment produces unbiased estimates for the potential outcome means. The outcome model effects should also have similar distributions between treatment conditions to avoid extrapolation when the potential outcomes are predicted.

Although you do not need to specify a propensity score model in order to estimate the potential outcome means by the regression adjustment method, you still need to specify the treatment variable in the PSMODEL statement. You can specify METHOD=REGADJ in the PROC CAUSALTRT statement to request the regression adjustment method. For more information about which model specifications are required for different estimation methods and how default estimation methods are determined, see the section Outline of Estimation Method Requirements.

The robust sandwich covariance estimate for is estimated by stacking the equations and the score function for the outcome models that are used to estimate and . By stacking the equations, PROC CAUSALTRT adjusts the covariance matrix for to account for the prediction of the potential outcomes . For more information about using the robust sandwich covariance estimate to compute standard errors, see the section Asymptotic Formulas.

Doubly Robust Estimation

Doubly robust estimation methods fit models for both the outcome and treatment variables. They combine inverse probability weighting and regression adjustment to estimate the potential outcome means. The methods are said to be doubly robust because they provide unbiased estimates for even if one of the models is misspecified (Bang and Robins 2005). In this sense, you have two opportunities to obtain unbiased estimation of causal effects by specifying either a correct treatment or a correct outcome model. The CAUSALTRT procedure implements two doubly robust estimation methods: the augmented inverse probability weighting (AIPW) method described in Lunceford and Davidian (2004) and the inverse probability weighted regression adjustment (IPWREG) method described in Wooldridge (2010).

The AIPW estimation method estimates propensity scores and the associated parameter estimates by fitting a logistic regression model for treatment assignment, as described in section Inverse Probability Weighting. It also estimates the potential outcomes by using the maximum likelihood fitting of the generalized linear models, as described in section Regression Adjustment. Note that the treatment assignment and outcome models do not need to be fit using the same set of model effects.

The AIPW estimates for potential outcome means solve the estimating equations

bold upper S Subscript normal a normal i normal p normal w Baseline left-parenthesis bold-italic mu right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts bold upper S Subscript normal a normal i normal p normal w comma i Baseline left-parenthesis bold-italic mu right-parenthesis equals bold 0

for , where

bold upper S Subscript normal a normal i normal p normal w comma i Baseline left-parenthesis bold-italic mu right-parenthesis equals StartBinomialOrMatrix StartFraction left-parenthesis 1 minus t Subscript i Baseline right-parenthesis y Subscript i Baseline Over 1 minus ModifyingAbove e With caret Subscript i Baseline EndFraction plus ModifyingAbove y With caret Subscript i Baseline 0 Baseline left-parenthesis StartFraction t Subscript i Baseline minus ModifyingAbove e With caret Subscript i Baseline Over 1 minus ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis minus mu 0 Choose StartFraction t Subscript i Baseline y Subscript i Baseline Over ModifyingAbove e With caret Subscript i Baseline EndFraction minus ModifyingAbove y With caret Subscript i Baseline 1 Baseline left-parenthesis StartFraction t Subscript i Baseline minus ModifyingAbove e With caret Subscript i Baseline Over ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis minus mu 1 EndBinomialOrMatrix equals StartBinomialOrMatrix ModifyingAbove y With caret Subscript i Baseline 0 Baseline minus left-parenthesis StartFraction 1 minus t Subscript i Baseline Over 1 minus ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis left-parenthesis ModifyingAbove y With caret Subscript i Baseline 0 Baseline minus y Subscript i Baseline right-parenthesis minus mu 0 Choose ModifyingAbove y With caret Subscript i Baseline 1 Baseline minus left-parenthesis StartFraction t Subscript i Baseline Over ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis left-parenthesis ModifyingAbove y With caret Subscript i Baseline 1 Baseline minus y Subscript i Baseline right-parenthesis minus mu 1 EndBinomialOrMatrix

The AIPW estimates for the potential outcome means are given by

ModifyingAbove mu With caret Subscript 0 Superscript normal a normal i normal p normal w Baseline equals n Superscript negative 1 Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction left-parenthesis 1 minus t Subscript i Baseline right-parenthesis y Subscript i Baseline Over 1 minus ModifyingAbove e With caret Subscript i Baseline EndFraction plus ModifyingAbove y With caret Subscript i Baseline 0 Baseline left-parenthesis StartFraction t Subscript i Baseline minus ModifyingAbove e With caret Subscript i Baseline Over 1 minus ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis

ModifyingAbove mu With caret Subscript 1 Superscript normal a normal i normal p normal w Baseline equals n Superscript negative 1 Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction t Subscript i Baseline y Subscript i Baseline Over ModifyingAbove e With caret Subscript i Baseline EndFraction minus ModifyingAbove y With caret Subscript i Baseline 1 Baseline left-parenthesis StartFraction t Subscript i Baseline minus ModifyingAbove e With caret Subscript i Baseline Over ModifyingAbove e With caret Subscript i Baseline EndFraction right-parenthesis

If one of the models is misspecified, remains an unbiased estimate of . However, the efficiency of and the estimation of its covariance matrix are affected.

The IPWREG estimation method uses the same logistic model for predicting , but it fits weighted models for the outcome variable separately for each treatment condition. To fit the outcome models, each observation is weighted by the inverse probability weights as follows:

StartFraction 1 Over probability left-parenthesis upper T equals t Subscript i Baseline vertical-bar bold x Subscript i Baseline right-parenthesis EndFraction equals StartFraction t Subscript i Baseline Over e left-parenthesis bold x Subscript i Baseline right-parenthesis EndFraction plus StartFraction 1 minus t Subscript i Baseline Over 1 minus e left-parenthesis bold x Subscript i Baseline right-parenthesis EndFraction

Let and denote the parameter estimates from the weighted regression outcome models for the control and treatment conditions, respectively. The predicted potential outcomes are then given by

ModifyingAbove y With caret Subscript i Baseline 0 Superscript w Baseline equals g Superscript negative 1 Baseline left-parenthesis bold x prime Subscript i Baseline ModifyingAbove bold-italic beta With caret Subscript normal c Superscript w Baseline right-parenthesis

ModifyingAbove y With caret Subscript i Baseline 1 Superscript w Baseline equals g Superscript negative 1 Baseline left-parenthesis bold x prime Subscript i Baseline ModifyingAbove bold-italic beta With caret Subscript normal t Superscript w Baseline right-parenthesis

where g is the link function being used. For the IPWREG method, PROC CAUSALTRT enforces the use of the canonical link for the distribution of the outcome variable. You cannot override the canonical link by the LINK= option. For information about the canonical link for each distribution, see Table 5.

The IPWREG estimates for the potential outcome means are given by

ModifyingAbove mu With caret Subscript 0 Superscript normal w normal r normal e normal g Baseline equals n Superscript negative 1 Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts ModifyingAbove y With caret Subscript i Baseline 0 Superscript w

ModifyingAbove mu With caret Subscript 1 Superscript normal w normal r normal e normal g Baseline equals n Superscript negative 1 Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts ModifyingAbove y With caret Subscript i Baseline 1 Superscript w

These estimates solve the following estimating equations:

bold upper S Subscript normal r normal e normal g Superscript w Baseline left-parenthesis bold-italic mu right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts bold upper S Subscript normal r normal e normal g comma i Superscript w Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts StartBinomialOrMatrix ModifyingAbove y With caret Subscript i Baseline 0 Superscript w Baseline minus mu 0 Choose ModifyingAbove y With caret Subscript i Baseline 1 Superscript w Baseline minus mu 1 EndBinomialOrMatrix equals bold 0

Last updated: December 09, 2022