The PSMATCH Procedure

Observational Studies Contrasted with Randomized Trials

In a randomized study, such as a randomized controlled trial, the subjects are randomly assigned to a treated (exposure) group or a control (nonexposure) group. Random assignment ensures that the distribution of the covariates is the same in both groups, and the treatment effect can be estimated from a direct comparison of the outcomes for the subjects in the two groups.

In contrast, the subjects in an observational study are not randomly assigned to the treated and control groups. Confounding can occur if some covariates are related to both the treatment assignment and the outcome. Consequently, there can be systematic differences between the treated subjects and the control subjects. The presence of confounding requires statistical approaches that remove the effects of confounding when estimating the effect of treatment.

Observational studies are carried out when it is impractical or unethical to perform a randomized experiment. One example of an observational study is a retrospective cohort study that examines the relationship between a specific disease and a risk factor that occurred in the past; another example is a nonrandomized clinical trial that uses existing data such as control units that are extracted from a registry database.

The approach that the PSMATCH procedure uses and the following terminology are based on the potential outcomes framework for causal inference, which was introduced by Rubin (1974) and Rosenbaum and Rubin (1983). Under this framework, each individual typically has two potential outcomes in an observational study whose goal is to estimate the effect of a treatment:

  • upper Y left-parenthesis 1 right-parenthesis, the outcome that would be observed if the individual receives the treatment.

  • upper Y left-parenthesis 0 right-parenthesis, the outcome that would be observed if the individual does not receive the treatment under identical circumstances to those under which the subject would have received the treatment.

However, only one outcome can be observed.

The treatment effect is defined as upper Y left-parenthesis 1 right-parenthesis minus upper Y left-parenthesis 0 right-parenthesis, and the average treatment effect is defined as:

ATE equals upper E left-parenthesis upper Y left-parenthesis 1 right-parenthesis minus upper Y left-parenthesis 0 right-parenthesis right-parenthesis

The average treatment effect for the treated (individuals who actually receive treatment) is defined as:

ATT equals upper E left-parenthesis upper Y left-parenthesis 1 right-parenthesis minus upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 1 right-parenthesis

where T denotes the treatment assignment.

In a randomized trial, the potential outcomes left-parenthesis upper Y left-parenthesis 0 right-parenthesis comma upper Y left-parenthesis 1 right-parenthesis right-parenthesis and the treatment assignment (T) are independent:

left-parenthesis upper Y left-parenthesis 0 right-parenthesis comma upper Y left-parenthesis 1 right-parenthesis right-parenthesis up-tack up-tack upper T

Thus, the average treatment effect (ATE) is identical to the average treatment effect for the treated (ATT), which can be expressed as follows and can be estimated from the observed data:

upper E left-parenthesis upper Y left-parenthesis 1 right-parenthesis vertical-bar upper T equals 1 right-parenthesis minus upper E left-parenthesis upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 0 right-parenthesis

In an observational study, the potential outcomes left-parenthesis upper Y left-parenthesis 0 right-parenthesis comma upper Y left-parenthesis 1 right-parenthesis right-parenthesis and the treatment assignment (T) might not be independent. In this case, the ATE and ATT are not the same. Furthermore, outcomes cannot be compared directly to estimate the treatment effect. In particular,

StartLayout 1st Row 1st Column ATT 2nd Column equals 3rd Column upper E left-parenthesis upper Y left-parenthesis 1 right-parenthesis minus upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 1 right-parenthesis 2nd Row 1st Column Blank 2nd Column equals 3rd Column upper E left-parenthesis upper Y left-parenthesis 1 right-parenthesis vertical-bar upper T equals 1 right-parenthesis minus upper E left-parenthesis upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 0 right-parenthesis plus upper E left-parenthesis upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 0 right-parenthesis minus upper E left-parenthesis upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 1 right-parenthesis EndLayout

The following term can be estimated from the observed data:

upper E left-parenthesis upper Y left-parenthesis 1 right-parenthesis vertical-bar upper T equals 1 right-parenthesis minus upper E left-parenthesis upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 0 right-parenthesis

However, the selection bias cannot be estimated from the observed data:

upper E left-parenthesis upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 0 right-parenthesis minus upper E left-parenthesis upper Y left-parenthesis 0 right-parenthesis vertical-bar upper T equals 1 right-parenthesis

The selection bias is the average difference in the response that would be observed between individuals in the control group who do not receive treatment and individuals in the treatment group who do not receive treatment. Thus, the usual observed difference between the treated and control groups cannot be used to estimate the treatment effect. For subjects who are not randomly assigned to the treated and control groups, the baseline variables could be related to both the treatment assignment and the outcome, and consequently direct comparison of outcomes could result in biased estimates.

One strategy for correctly estimating the treatment effect is based on the propensity score, which is the conditional probability of the treatment assignment given the observed variables. You use propensity scores to account for confounding by weighting observations, by creating strata of subjects that have similar propensity scores, or by matching control subjects to treated subjects. This is done prior to the outcome analysis and without knowledge of the outcome variable (Rosenbaum and Rubin 1984; Stuart 2010, p. 5). The following section describes the propensity score approach.

Last updated: December 09, 2022