The PSMATCH Procedure

Example 101.9 Sensitivity Analysis after One-to-One Matching

(View the complete code for this example.)

This example illustrates how you can analyze sensitivity to the assumption of no unobserved confounders after performing one-to-one matching with the PSMATCH procedure. For a detailed description of this analysis, see the section Sensitivity Analysis.

A pharmaceutical company conducts a nonrandomized clinical trial to demonstrate the efficacy of a new treatment (Drug_X) to decrease the low-density lipoprotein (LDL) by comparing it to an existing treatment (Drug_A). The data set Drugs, which is described in Getting Started: PSMATCH Procedure, contains baseline variable measurements for individuals from the treated and control groups.

Output 101.9.1 lists the first eight observations.

Output 101.9.1: Input Drugs Data Set

Obs	PatientID	Drug	Gender	Age	BMI
1	1	Drug_X	Male	29	22.02
2	2	Drug_A	Male	45	26.68
3	3	Drug_A	Male	42	21.84
4	4	Drug_X	Male	38	22.71
5	5	Drug_A	Male	31	22.76
6	6	Drug_A	Male	43	26.86
7	7	Drug_A	Female	45	25.47
8	8	Drug_A	Female	49	24.28

The possibility of treatment selection bias is a concern in the analysis of the results. Patients in the trial can choose the treatment that they prefer; otherwise, physicians assign each patient to a treatment. This could lead to systematic differences in the distributions of the baseline variables in the two groups, resulting in a biased estimate of the treatment effect. Propensity score analysis that is based on matching offers an alternative that addresses this problem by balancing the distributions of the variables.

The following statements request optimal matching of observations for patients in the treatment group with observations for patients in the control group:

proc psmatch data=drugs region=cs;
   class Drug Gender;
   psmodel Drug(Treated='Drug_X')= Gender Age BMI;
   match method=optimal(k=1) exact=Gender distance=lps caliper=0.25
         weight=none;
   output out(obs=match)=Outgs lps=_Lps matchid=_MatchID;
run;

The statements are identical to those in Getting Started: PSMATCH Procedure, except that the ASSESS statement is not used here. The MATCH statement requests optimal matching of one control unit to each unit in the treated group in order to minimize the total within-pair difference.

The OUT(OBS=MATCH)=Outgs option in the OUTPUT statement creates an output data set, Outgs, that contains the matched observations.

After the trial, the data set Cholesterol contains the LDL information for the matched observations. PatientID is the patient identification number, and the response variable LDL is the decrease in LDL, measured in milligrams per deciliter of blood (mg/dl).

The following statements combine the two data sets and list the eight observations in the combined Cholesterol data set, which are shown in Output 101.9.2:

proc sort data=Outgs out=Outgs1;
   by PatientID;
run;

proc sort data=Cholesterol out=Cholesterol1;
   by PatientID;
run;

data OutEx9a;
   merge Outgs1 Cholesterol1;
   by PatientID;
run;

proc print data=OutEx9a(obs=8);
   var PatientID Drug Gender Age BMI LDL _MatchID;
run;

Output 101.9.2: Output Data Set with LDL Decreases

Obs	PatientID	Drug	Gender	Age	BMI	LDL	_MatchID
1	1	Drug_X	Male	29	22.02	6.54	74
2	3	Drug_A	Male	42	21.84	-5.66	7
3	4	Drug_X	Male	38	22.71	5.52	24
4	5	Drug_A	Male	31	22.76	7.26	76
5	9	Drug_A	Male	31	23.31	2.64	82
6	10	Drug_X	Female	39	25.34	4.77	43
7	13	Drug_X	Female	32	24.78	4.25	84
8	18	Drug_X	Male	34	26.30	0.68	99

The following statements compute the differences in LDL between the treated and control units in each matched set:

proc sort data=OutEx9a out=OutEx9b;
   by _MatchID Drug;
run;

proc transpose data=OutEx9b out=OutEx9c;
   by _MatchID;
   var LDL;
run;

data OutEx9c;
   set OutEx9c;
   Diff= Col2 - Col1;
   drop Col1 COl2;
run;

Output 101.9.3 lists the differences in LDL decrease in the first four matched sets.

Output 101.9.3: LDL Differences in Matched sets

Obs	_MatchID	_NAME_	Diff
1	1	LDL	3.25
2	2	LDL	2.44
3	3	LDL	6.34
4	4	LDL	-1.51

The following statements perform a signed rank test, and the results are shown in Output 101.9.4.

ods select TestsForLocation;
proc univariate data=OutEx9c;
   var Diff;
   ods output TestsForLocation=LocTest;
run;

Output 101.9.4: Tests for Location

The UNIVARIATE Procedure

Variable: Diff

Tests for Location: Mu0=0
Test	Statistic		p Value
Student's t	t	2.690243	Pr > \|t\|	0.0082
Sign	M	11.5	Pr >= \|M\|	0.0380
Signed Rank	S	885.5	Pr >= \|S\|	0.0106

The "Tests for Location" table shows that there is a significant decrease in LDL at the 0.025 level for patients in the treated group.

Propensity score analysis assumes that all confounders (variables that affect both the outcome and the treatment assignment) have been measured. However, this assumption cannot be verified. When there are unobserved covariates, individuals that have the same observed covariates might not have the same probability of being assigned to the treated group. If you assume that all confounders have been measured, you should examine the sensitivity of inferences to departures from the assumption.

Based on the approach described in the section Sensitivity Analysis on Matched Observations, the signed rank statistic is

upper S equals sigma-summation Underscript j colon d Subscript j Baseline greater-than 0 Endscripts d Subscript j Superscript plus

Note that this statistic is not centered, unlike the signed rank statistic that is computed by PROC UNIVARIATE and is shown in Output 101.9.4:

sigma-summation Underscript j colon d Subscript j Baseline greater-than 0 Endscripts d Subscript j Superscript plus minus StartFraction n Subscript t Baseline left-parenthesis n Subscript t Baseline plus 1 right-parenthesis Over 4 EndFraction

The following statements compute the signed rank statistic:

data SgnRank;
   set LocTest;
   nPairs=113;
   if (Test='Signed Rank');
   SgnRank= Stat + nPairs*(nPairs+1)/4;
   keep nPairs SgnRank;
run;

Output 101.9.5 displays the signed rank statistic.

Output 101.9.5: Signed Rank Statistic

Obs	nPairs	SgnRank
1	113	4106

Using this statistic, the following statements compute and display p-values for signed rank tests that correspond to values that range from 1 to 1.5.

data Test1;
   set SgnRank;
   mean0     = nPairs*(nPairs+1)/2;
   variance0 = mean0*(2*nPairs+1)/3;

   do Gamma=1 to 1.5 by 0.05;
      mean     = Gamma/(1+Gamma) * mean0;
      variance = Gamma/(1+Gamma)**2 * variance0;
      tTest    = (SgnRank - mean) / sqrt(variance);
      pValue   = 1 - probt(tTest, nPairs-1);
      output;
   end;
 run;

 proc print data=Test1;
 run;

Output 101.9.6: p-Values for Values from 1 to 1.5

Obs	nPairs	SgnRank	mean0	variance0	Gamma	mean	variance	tTest	pValue
1	113	4106	6441	487369	1.00	3220.50	121842.25	2.53682	0.00628
2	113	4106	6441	487369	1.05	3299.05	121769.77	2.31248	0.01129
3	113	4106	6441	487369	1.10	3373.86	121565.96	2.09986	0.01899
4	113	4106	6441	487369	1.15	3445.19	121249.18	1.89775	0.03015
5	113	4106	6441	487369	1.20	3513.27	120835.29	1.70513	0.04547
6	113	4106	6441	487369	1.25	3578.33	120338.02	1.52110	0.06553
7	113	4106	6441	487369	1.30	3640.57	119769.32	1.34489	0.09069
8	113	4106	6441	487369	1.35	3700.15	119139.55	1.17581	0.12108
9	113	4106	6441	487369	1.40	3757.25	118457.74	1.01329	0.15655
10	113	4106	6441	487369	1.45	3812.02	117731.79	0.85678	0.19670
11	113	4106	6441	487369	1.50	3864.60	116968.56	0.70583	0.24088

Output 101.9.6 shows that at the tipping point =1.15, the p-value is 0.0355, which is larger than the Type I error level of 0.025. Thus the study conclusion is reversed if for two individuals k and l in the same matched set, the probability that individual k is in the treated group and l is in the control group is

StartFraction pi Subscript k Baseline Over pi Subscript k Baseline plus pi Subscript l Baseline EndFraction equals StartFraction normal upper Gamma Over 1 plus normal upper Gamma EndFraction equals 0.535

If =1.15 represents only a small departure from random treatment assignment (=1), the study conclusion is not robust to hidden bias from an unobserved confounder.

Last updated: December 09, 2022