The PSMATCH Procedure

Example 101.9 Sensitivity Analysis after One-to-One Matching

(View the complete code for this example.)

This example illustrates how you can analyze sensitivity to the assumption of no unobserved confounders after performing one-to-one matching with the PSMATCH procedure. For a detailed description of this analysis, see the section Sensitivity Analysis.

A pharmaceutical company conducts a nonrandomized clinical trial to demonstrate the efficacy of a new treatment (Drug_X) to decrease the low-density lipoprotein (LDL) by comparing it to an existing treatment (Drug_A). The data set Drugs, which is described in Getting Started: PSMATCH Procedure, contains baseline variable measurements for individuals from the treated and control groups.

Output 101.9.1 lists the first eight observations.

Output 101.9.1: Input Drugs Data Set

Obs PatientID Drug Gender Age BMI
1 1 Drug_X Male 29 22.02
2 2 Drug_A Male 45 26.68
3 3 Drug_A Male 42 21.84
4 4 Drug_X Male 38 22.71
5 5 Drug_A Male 31 22.76
6 6 Drug_A Male 43 26.86
7 7 Drug_A Female 45 25.47
8 8 Drug_A Female 49 24.28


The possibility of treatment selection bias is a concern in the analysis of the results. Patients in the trial can choose the treatment that they prefer; otherwise, physicians assign each patient to a treatment. This could lead to systematic differences in the distributions of the baseline variables in the two groups, resulting in a biased estimate of the treatment effect. Propensity score analysis that is based on matching offers an alternative that addresses this problem by balancing the distributions of the variables.

The following statements request optimal matching of observations for patients in the treatment group with observations for patients in the control group:

proc psmatch data=drugs region=cs;
   class Drug Gender;
   psmodel Drug(Treated='Drug_X')= Gender Age BMI;
   match method=optimal(k=1) exact=Gender distance=lps caliper=0.25
         weight=none;
   output out(obs=match)=Outgs lps=_Lps matchid=_MatchID;
run;

The statements are identical to those in Getting Started: PSMATCH Procedure, except that the ASSESS statement is not used here. The MATCH statement requests optimal matching of one control unit to each unit in the treated group in order to minimize the total within-pair difference.

The OUT(OBS=MATCH)=Outgs option in the OUTPUT statement creates an output data set, Outgs, that contains the matched observations.

After the trial, the data set Cholesterol contains the LDL information for the matched observations. PatientID is the patient identification number, and the response variable LDL is the decrease in LDL, measured in milligrams per deciliter of blood (mg/dl).

The following statements combine the two data sets and list the eight observations in the combined Cholesterol data set, which are shown in Output 101.9.2:

proc sort data=Outgs out=Outgs1;
   by PatientID;
run;

proc sort data=Cholesterol out=Cholesterol1;
   by PatientID;
run;

data OutEx9a;
   merge Outgs1 Cholesterol1;
   by PatientID;
run;

proc print data=OutEx9a(obs=8);
   var PatientID Drug Gender Age BMI LDL _MatchID;
run;

Output 101.9.2: Output Data Set with LDL Decreases

Obs PatientID Drug Gender Age BMI LDL _MatchID
1 1 Drug_X Male 29 22.02 6.54 74
2 3 Drug_A Male 42 21.84 -5.66 7
3 4 Drug_X Male 38 22.71 5.52 24
4 5 Drug_A Male 31 22.76 7.26 76
5 9 Drug_A Male 31 23.31 2.64 82
6 10 Drug_X Female 39 25.34 4.77 43
7 13 Drug_X Female 32 24.78 4.25 84
8 18 Drug_X Male 34 26.30 0.68 99


The following statements compute the differences in LDL between the treated and control units in each matched set:

proc sort data=OutEx9a out=OutEx9b;
   by _MatchID Drug;
run;

proc transpose data=OutEx9b out=OutEx9c;
   by _MatchID;
   var LDL;
run;

data OutEx9c;
   set OutEx9c;
   Diff= Col2 - Col1;
   drop Col1 COl2;
run;

Output 101.9.3 lists the differences in LDL decrease in the first four matched sets.

Output 101.9.3: LDL Differences in Matched sets

Obs _MatchID _NAME_ Diff
1 1 LDL 3.25
2 2 LDL 2.44
3 3 LDL 6.34
4 4 LDL -1.51


The following statements perform a signed rank test, and the results are shown in Output 101.9.4.

ods select TestsForLocation;
proc univariate data=OutEx9c;
   var Diff;
   ods output TestsForLocation=LocTest;
run;

Output 101.9.4: Tests for Location

The UNIVARIATE Procedure
Variable: Diff

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 2.690243 Pr > |t| 0.0082
Sign M 11.5 Pr >= |M| 0.0380
Signed Rank S 885.5 Pr >= |S| 0.0106


The "Tests for Location" table shows that there is a significant decrease in LDL at the 0.025 level for patients in the treated group.

Propensity score analysis assumes that all confounders (variables that affect both the outcome and the treatment assignment) have been measured. However, this assumption cannot be verified. When there are unobserved covariates, individuals that have the same observed covariates might not have the same probability of being assigned to the treated group. If you assume that all confounders have been measured, you should examine the sensitivity of inferences to departures from the assumption.

Based on the approach described in the section Sensitivity Analysis on Matched Observations, the signed rank statistic is

upper S equals sigma-summation Underscript j colon d Subscript j Baseline greater-than 0 Endscripts d Subscript j Superscript plus

Note that this statistic is not centered, unlike the signed rank statistic that is computed by PROC UNIVARIATE and is shown in Output 101.9.4:

sigma-summation Underscript j colon d Subscript j Baseline greater-than 0 Endscripts d Subscript j Superscript plus minus StartFraction n Subscript t Baseline left-parenthesis n Subscript t Baseline plus 1 right-parenthesis Over 4 EndFraction

The following statements compute the signed rank statistic:

data SgnRank;
   set LocTest;
   nPairs=113;
   if (Test='Signed Rank');
   SgnRank= Stat + nPairs*(nPairs+1)/4;
   keep nPairs SgnRank;
run;

Output 101.9.5 displays the signed rank statistic.

Output 101.9.5: Signed Rank Statistic

Obs nPairs SgnRank
1 113 4106


Using this statistic, the following statements compute and display p-values for signed rank tests that correspond to normal upper Gamma values that range from 1 to 1.5.

data Test1;
   set SgnRank;
   mean0     = nPairs*(nPairs+1)/2;
   variance0 = mean0*(2*nPairs+1)/3;

   do Gamma=1 to 1.5 by 0.05;
      mean     = Gamma/(1+Gamma) * mean0;
      variance = Gamma/(1+Gamma)**2 * variance0;
      tTest    = (SgnRank - mean) / sqrt(variance);
      pValue   = 1 - probt(tTest, nPairs-1);
      output;
   end;
 run;

 proc print data=Test1;
 run;

Output 101.9.6: p-Values for normal upper Gamma Values from 1 to 1.5

Obs nPairs SgnRank mean0 variance0 Gamma mean variance tTest pValue
1 113 4106 6441 487369 1.00 3220.50 121842.25 2.53682 0.00628
2 113 4106 6441 487369 1.05 3299.05 121769.77 2.31248 0.01129
3 113 4106 6441 487369 1.10 3373.86 121565.96 2.09986 0.01899
4 113 4106 6441 487369 1.15 3445.19 121249.18 1.89775 0.03015
5 113 4106 6441 487369 1.20 3513.27 120835.29 1.70513 0.04547
6 113 4106 6441 487369 1.25 3578.33 120338.02 1.52110 0.06553
7 113 4106 6441 487369 1.30 3640.57 119769.32 1.34489 0.09069
8 113 4106 6441 487369 1.35 3700.15 119139.55 1.17581 0.12108
9 113 4106 6441 487369 1.40 3757.25 118457.74 1.01329 0.15655
10 113 4106 6441 487369 1.45 3812.02 117731.79 0.85678 0.19670
11 113 4106 6441 487369 1.50 3864.60 116968.56 0.70583 0.24088


Output 101.9.6 shows that at the tipping point normal upper Gamma=1.15, the p-value is 0.0355, which is larger than the Type I error level of 0.025. Thus the study conclusion is reversed if for two individuals k and l in the same matched set, the probability that individual k is in the treated group and l is in the control group is

StartFraction pi Subscript k Baseline Over pi Subscript k Baseline plus pi Subscript l Baseline EndFraction equals StartFraction normal upper Gamma Over 1 plus normal upper Gamma EndFraction equals 0.535

If normal upper Gamma=1.15 represents only a small departure from random treatment assignment (normal upper Gamma=1), the study conclusion is not robust to hidden bias from an unobserved confounder.

Last updated: December 09, 2022