(View the complete code for this example.)
This example illustrates how you can analyze sensitivity to the assumption of no unobserved confounders after performing one-to-one matching with the PSMATCH procedure. For a detailed description of this analysis, see the section Sensitivity Analysis.
A pharmaceutical company conducts a nonrandomized clinical trial to demonstrate the efficacy of a new treatment (Drug_X) to decrease the low-density lipoprotein (LDL) by comparing it to an existing treatment (Drug_A). The data set Drugs, which is described in Getting Started: PSMATCH Procedure, contains baseline variable measurements for individuals from the treated and control groups.
Output 101.9.1 lists the first eight observations.
Output 101.9.1: Input Drugs Data Set
| Obs | PatientID | Drug | Gender | Age | BMI |
|---|---|---|---|---|---|
| 1 | 1 | Drug_X | Male | 29 | 22.02 |
| 2 | 2 | Drug_A | Male | 45 | 26.68 |
| 3 | 3 | Drug_A | Male | 42 | 21.84 |
| 4 | 4 | Drug_X | Male | 38 | 22.71 |
| 5 | 5 | Drug_A | Male | 31 | 22.76 |
| 6 | 6 | Drug_A | Male | 43 | 26.86 |
| 7 | 7 | Drug_A | Female | 45 | 25.47 |
| 8 | 8 | Drug_A | Female | 49 | 24.28 |
The possibility of treatment selection bias is a concern in the analysis of the results. Patients in the trial can choose the treatment that they prefer; otherwise, physicians assign each patient to a treatment. This could lead to systematic differences in the distributions of the baseline variables in the two groups, resulting in a biased estimate of the treatment effect. Propensity score analysis that is based on matching offers an alternative that addresses this problem by balancing the distributions of the variables.
The following statements request optimal matching of observations for patients in the treatment group with observations for patients in the control group:
proc psmatch data=drugs region=cs;
class Drug Gender;
psmodel Drug(Treated='Drug_X')= Gender Age BMI;
match method=optimal(k=1) exact=Gender distance=lps caliper=0.25
weight=none;
output out(obs=match)=Outgs lps=_Lps matchid=_MatchID;
run;
The statements are identical to those in Getting Started: PSMATCH Procedure, except that the ASSESS statement is not used here. The MATCH statement requests optimal matching of one control unit to each unit in the treated group in order to minimize the total within-pair difference.
The OUT(OBS=MATCH)=Outgs option in the OUTPUT statement creates an output data set, Outgs, that contains the matched observations.
After the trial, the data set Cholesterol contains the LDL information for the matched observations. PatientID is the patient identification number, and the response variable LDL is the decrease in LDL, measured in milligrams per deciliter of blood (mg/dl).
The following statements combine the two data sets and list the eight observations in the combined Cholesterol data set, which are shown in Output 101.9.2:
proc sort data=Outgs out=Outgs1;
by PatientID;
run;
proc sort data=Cholesterol out=Cholesterol1;
by PatientID;
run;
data OutEx9a;
merge Outgs1 Cholesterol1;
by PatientID;
run;
proc print data=OutEx9a(obs=8);
var PatientID Drug Gender Age BMI LDL _MatchID;
run;
Output 101.9.2: Output Data Set with LDL Decreases
| Obs | PatientID | Drug | Gender | Age | BMI | LDL | _MatchID |
|---|---|---|---|---|---|---|---|
| 1 | 1 | Drug_X | Male | 29 | 22.02 | 6.54 | 74 |
| 2 | 3 | Drug_A | Male | 42 | 21.84 | -5.66 | 7 |
| 3 | 4 | Drug_X | Male | 38 | 22.71 | 5.52 | 24 |
| 4 | 5 | Drug_A | Male | 31 | 22.76 | 7.26 | 76 |
| 5 | 9 | Drug_A | Male | 31 | 23.31 | 2.64 | 82 |
| 6 | 10 | Drug_X | Female | 39 | 25.34 | 4.77 | 43 |
| 7 | 13 | Drug_X | Female | 32 | 24.78 | 4.25 | 84 |
| 8 | 18 | Drug_X | Male | 34 | 26.30 | 0.68 | 99 |
The following statements compute the differences in LDL between the treated and control units in each matched set:
proc sort data=OutEx9a out=OutEx9b;
by _MatchID Drug;
run;
proc transpose data=OutEx9b out=OutEx9c;
by _MatchID;
var LDL;
run;
data OutEx9c;
set OutEx9c;
Diff= Col2 - Col1;
drop Col1 COl2;
run;
Output 101.9.3 lists the differences in LDL decrease in the first four matched sets.
Output 101.9.3: LDL Differences in Matched sets
| Obs | _MatchID | _NAME_ | Diff |
|---|---|---|---|
| 1 | 1 | LDL | 3.25 |
| 2 | 2 | LDL | 2.44 |
| 3 | 3 | LDL | 6.34 |
| 4 | 4 | LDL | -1.51 |
The following statements perform a signed rank test, and the results are shown in Output 101.9.4.
ods select TestsForLocation;
proc univariate data=OutEx9c;
var Diff;
ods output TestsForLocation=LocTest;
run;
Output 101.9.4: Tests for Location
| Tests for Location: Mu0=0 | ||||
|---|---|---|---|---|
| Test | Statistic | p Value | ||
| Student's t | t | 2.690243 | Pr > |t| | 0.0082 |
| Sign | M | 11.5 | Pr >= |M| | 0.0380 |
| Signed Rank | S | 885.5 | Pr >= |S| | 0.0106 |
The "Tests for Location" table shows that there is a significant decrease in LDL at the 0.025 level for patients in the treated group.
Propensity score analysis assumes that all confounders (variables that affect both the outcome and the treatment assignment) have been measured. However, this assumption cannot be verified. When there are unobserved covariates, individuals that have the same observed covariates might not have the same probability of being assigned to the treated group. If you assume that all confounders have been measured, you should examine the sensitivity of inferences to departures from the assumption.
Based on the approach described in the section Sensitivity Analysis on Matched Observations, the signed rank statistic is
Note that this statistic is not centered, unlike the signed rank statistic that is computed by PROC UNIVARIATE and is shown in Output 101.9.4:
The following statements compute the signed rank statistic:
data SgnRank;
set LocTest;
nPairs=113;
if (Test='Signed Rank');
SgnRank= Stat + nPairs*(nPairs+1)/4;
keep nPairs SgnRank;
run;
Output 101.9.5 displays the signed rank statistic.
Output 101.9.5: Signed Rank Statistic
| Obs | nPairs | SgnRank |
|---|---|---|
| 1 | 113 | 4106 |
Using this statistic, the following statements compute and display p-values for signed rank tests that correspond to values that range from 1 to 1.5.
data Test1;
set SgnRank;
mean0 = nPairs*(nPairs+1)/2;
variance0 = mean0*(2*nPairs+1)/3;
do Gamma=1 to 1.5 by 0.05;
mean = Gamma/(1+Gamma) * mean0;
variance = Gamma/(1+Gamma)**2 * variance0;
tTest = (SgnRank - mean) / sqrt(variance);
pValue = 1 - probt(tTest, nPairs-1);
output;
end;
run;
proc print data=Test1;
run;
Output 101.9.6: p-Values for Values from 1 to 1.5
| Obs | nPairs | SgnRank | mean0 | variance0 | Gamma | mean | variance | tTest | pValue |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 113 | 4106 | 6441 | 487369 | 1.00 | 3220.50 | 121842.25 | 2.53682 | 0.00628 |
| 2 | 113 | 4106 | 6441 | 487369 | 1.05 | 3299.05 | 121769.77 | 2.31248 | 0.01129 |
| 3 | 113 | 4106 | 6441 | 487369 | 1.10 | 3373.86 | 121565.96 | 2.09986 | 0.01899 |
| 4 | 113 | 4106 | 6441 | 487369 | 1.15 | 3445.19 | 121249.18 | 1.89775 | 0.03015 |
| 5 | 113 | 4106 | 6441 | 487369 | 1.20 | 3513.27 | 120835.29 | 1.70513 | 0.04547 |
| 6 | 113 | 4106 | 6441 | 487369 | 1.25 | 3578.33 | 120338.02 | 1.52110 | 0.06553 |
| 7 | 113 | 4106 | 6441 | 487369 | 1.30 | 3640.57 | 119769.32 | 1.34489 | 0.09069 |
| 8 | 113 | 4106 | 6441 | 487369 | 1.35 | 3700.15 | 119139.55 | 1.17581 | 0.12108 |
| 9 | 113 | 4106 | 6441 | 487369 | 1.40 | 3757.25 | 118457.74 | 1.01329 | 0.15655 |
| 10 | 113 | 4106 | 6441 | 487369 | 1.45 | 3812.02 | 117731.79 | 0.85678 | 0.19670 |
| 11 | 113 | 4106 | 6441 | 487369 | 1.50 | 3864.60 | 116968.56 | 0.70583 | 0.24088 |
Output 101.9.6 shows that at the tipping point =1.15, the p-value is 0.0355, which is larger than the Type I error level of 0.025. Thus the study conclusion is reversed if for two individuals k and l in the same matched set, the probability that individual k is in the treated group and l is in the control group is
If =1.15 represents only a small departure from random treatment assignment (
=1), the study conclusion is not robust to hidden bias from an unobserved confounder.