The PSMATCH Procedure

Example 101.8 Matching with Precomputed Propensity Scores

(View the complete code for this example.)

The PSMATCH procedure provides the capability for fitting a binary logistic regression model that is used to compute propensity scores for matching. However, there might be situations in which you have already computed the propensity scores—for example, by using other procedures in SAS/STAT software that perform logistic regression. This example illustrates optimal matching with precomputed propensity scores that are provided in the input data set for PROC PSMATCH.

The data for this example are observations on patients in a nonrandomized clinical trial. The trial and the Drugs data set that contains the patient information are described in the section Getting Started: PSMATCH Procedure.

The following statements use the LOGISTIC procedure to derive propensity scores:

ods select none;
proc logistic data=drugs;
   class Drug Gender;
   model Drug(Event='Drug_X')= Gender Age BMI / link=cloglog;
   output out=drug1 p=pscore;
run;
ods select all;

The LINK=CLOGLOG option fits the complementary log-log model and derives propensity scores that are used in the PSMATCH procedure. The option is used just to demonstrate that, other than the logit link that is provided in the PSMATCH procedure, you can use a different model to derive propensity scores and then input these propensity scores in the PSMATCH procedure.

The output data set Drug1 is constructed from the data set Drugs and contains the PScore variable for propensity scores.

Output 101.8.1 lists the first 10 observations.

Output 101.8.1: Data Set with Propensity Scores

Obs PatientID Drug Gender Age BMI pscore
1 284 Drug_X Male 29 22.02 0.35498
2 201 Drug_A Male 45 26.68 0.21794
3 147 Drug_A Male 42 21.84 0.12261
4 307 Drug_X Male 38 22.71 0.19821
5 433 Drug_A Male 31 22.76 0.34298
6 435 Drug_A Male 43 26.86 0.26261
7 159 Drug_A Female 45 25.47 0.15077
8 368 Drug_A Female 49 24.28 0.08713
9 286 Drug_A Male 31 23.31 0.37211
10 163 Drug_X Female 39 25.34 0.24005


The following statements request optimal matching to match patients in the treatment group to patients in the control group:

ods graphics on;
proc psmatch data=Drug1 region=cs;
   class Drug Gender;
   psdata treatvar=Drug(Treated='Drug_X') ps=pscore;
   match method=optimal(k=1) exact=Gender distance=lps caliper=0.5
         weight=none;
   assess lps var=(Gender Age BMI);
   output out(obs=match)=OutEx8 lps=_Lps matchid=_MatchID;
run;

The PSMODEL statement is not used in this example because the propensity scores are provided in Drug1. Instead, the PSDATA statement is used to identify the binary treatment variable and the propensity score variable in Drug1. The CLASS statement specifies the classification variables. The PS= option specifies pscore as the propensity score variable. The TREATVAR=DRUG option specifies Drug as the binary treatment indicator variable, and TREATED='Drug_X' identifies Drug_X as the treated group.

The PSMATCH procedure matches only those observations whose propensity scores lie in the support region that you specify with the REGION= option. Here the REGION=CS option requests that only those observations whose propensity scores (or equivalently, logits of propensity scores) lie in the common support region be used for matching. The common support region is the largest interval that contains propensity scores (or equivalently, logits of propensity scores) for both treated and control observations. By default, the region is extended by 0.25 times the pooled estimate of the common standard deviation of the logits of the propensity scores.

The MATCH statement specifies the criteria for matching. The DISTANCE=LPS option (which is the default) requests that the logit of the propensity score be used in computing differences between pairs of observations. The METHOD=OPTIMAL(K=1) option (which is the default) requests optimal matching of one control unit to each unit in the treated group in order to minimize the total within-pair difference. The EXACT=GENDER option requests that the treated unit and its matched control unit have the same value of the Gender variable. The CALIPER=0.5 option requests that a match be made only if the difference in the logits of the propensity scores for pairs of individuals is less than or equal to 0.5 times the pooled estimate of the common standard deviation of the logits of the propensity scores.

The "Data Information" table in Output 101.8.2 displays the numbers of observations in the treated and control groups, the lower and upper limits for the propensity scores of observations in the support region, and the numbers of observations in the treated and control groups that fall within the support region. Of the 373 observations in the control group, 352 fall within the support region.

Output 101.8.2: Data Information

The PSMATCH Procedure

Data Information
Data Set WORK.DRUG1
Output Data Set WORK.OUTEX8
Treatment Variable Drug
Treated Group Drug_X
All Obs (Treated) 113
All Obs (Control) 373
Support Region Extended Common Support
Lower PS Support 0.060563
Upper PS Support 0.698199
Support Region Obs (Treated) 113
Support Region Obs (Control) 352


The "Propensity Score Information" table in Output 101.8.3 displays summary statistics by treatment group for all observations, for observations in the support region, and for matched observations.

Output 101.8.3: Propensity Score Information

Propensity Score Information
Observations Treated (Drug = Drug_X) Control (Drug = Drug_A) Treated -
Control
N Mean Standard
Deviation
Minimum Maximum N Mean Standard
Deviation
Minimum Maximum Mean
Difference
All 113 0.3040 0.1287 0.0715 0.6594 373 0.2089 0.1255 0.0295 0.7135 0.0952
Region 113 0.3040 0.1287 0.0715 0.6594 352 0.2146 0.1177 0.0606 0.6519 0.0894
Matched 113 0.3040 0.1287 0.0715 0.6594 113 0.2984 0.1215 0.0723 0.6519 0.0056


The "Matching Information" table in Output 101.8.4 displays the matching criteria, the number of matched sets, the numbers of matched observations in the treated and control groups, and the total absolute difference in the logits of the propensity scores for all matches.

Output 101.8.4: Matching Information

Matching Information
Distance Metric Logit of Propensity Score
Method Optimal Fixed Ratio Matching
Control/Treated Ratio 1
Caliper (Logit PS) 0.356051
Matched Sets 113
Matched Obs (Treated) 113
Matched Obs (Control) 113
Total Absolute Difference 3.616259


The ASSESS statement produces tables and plots that summarize differences in the distributions of the specified variables between treated and control groups for all observations, for observations in the support region, and for matched observations. As requested by the LPS and VAR= options, the variables that are listed in the table are the logit of the propensity score and the variables Gender, Age, and BMI. The WEIGHT=NONE option suppresses the display of differences for the weighted matched observations. When one control unit is matched to each treated unit, the weights are all 1 for matched treated and control units, so the results for weighted matched observations and matched observations are identical.

The "Standardized Mean Differences" table displays standardized mean differences in the variables between the treated and control groups. For a binary classification variable (Gender), the difference is in the proportion of the first ordered level (Female).

Output 101.8.5: Standardized Mean Differences

The PSMATCH Procedure

Standardized Mean Differences (Treated - Control)
Variable Observations Mean
Difference
Standard
Deviation
Standardized
Difference
Percent
Reduction
Variance
Ratio
Logit Prop Score All 0.58239 0.712102 0.81785   0.7177
  Region 0.51613   0.72480 11.38 0.9052
  Matched 0.02268   0.03184 96.11 1.0929
Age All -4.09509 6.079104 -0.67363   0.7076
  Region -3.58515   -0.58975 12.45 0.7928
  Matched 0.11504   0.01892 97.19 1.0143
BMI All 0.73930 1.923178 0.38441   0.8854
  Region 0.65089   0.33845 11.96 0.9394
  Matched 0.14619   0.07602 80.23 1.3509
Gender All -0.02482 0.496925 -0.04994   0.9892
  Region -0.01808   -0.03638 27.16 0.9916
  Matched 0.00000   0.00000 100.00 1.0000
Standard deviation of All observations used to compute standardized differences


The standardized mean differences are significantly reduced in the matched observations, and the largest of these differences is 0.076 in absolute value, which is less than the recommended upper limit of 0.25. The treated-to-control variance ratios between the two groups are between 1 and 1.3509 for all variables in the matched observations, which is within the recommended range of 0.5 to 2. Because both EXACT=GENDER and METHOD=OPTIMAL are specified in the MATCH statement, the standardized mean difference for Gender is 0 in the matched observations.

The PSMATCH procedure displays a standardized mean differences plot, as shown in Output 101.8.6, for the variables that are specified in the ASSESS statement.

Output 101.8.6: Standardized Mean Differences Plot

Standardized Mean Differences Plot


The "Standardized Mean Differences Plot" displays the standardized mean differences that are listed in the "Standardized Mean Differences" table in Output 101.8.5. All differences for the matched observations are within the recommended limits of –0.25 and 0.25, which are indicated by the shaded area.

Because matching results in good balance for the variables in this example, the matched observations can be saved in an output data set for use in a subsequent outcome analysis.

In situations where you are not satisfied with the variable balance, you can do one or more of the following to improve the balance: you can select another set of variables to fit the propensity score model, you can modify the matching criteria, or you can choose another matching method.

The OUT(OBS=MATCH)=OutEx8 option in the OUTPUT statement creates an output data set, OutEx8, that contains the matched observations. The following statements list the observations in the first five matched sets, as shown in Output 101.8.7:

proc sort data=OutEx8 out=OutEx8a;
   by _MatchID;
run;

proc print data=OutEx8a(obs=10);
   var PatientID Drug Gender Age BMI pscore _LPS _MatchWgt_ _MatchID;
run;

Output 101.8.7: Output Data Set With Optimal Matches

Obs PatientID Drug Gender Age BMI pscore _Lps _MATCHWGT_ _MatchID
1 213 Drug_A Female 49 23.24 0.07234 -2.55123 1 1
2 89 Drug_X Female 44 20.75 0.07152 -2.56356 1 1
3 245 Drug_A Female 52 25.32 0.08090 -2.43015 1 2
4 323 Drug_X Female 46 22.22 0.07822 -2.46677 1 2
5 429 Drug_A Male 49 24.00 0.09865 -2.21228 1 3
6 217 Drug_X Male 49 23.96 0.09796 -2.22013 1 3
7 234 Drug_X Female 41 21.11 0.09887 -2.20987 1 4
8 66 Drug_A Female 48 24.53 0.09927 -2.20531 1 4
9 183 Drug_A Female 45 23.62 0.10931 -2.09786 1 5
10 320 Drug_X Female 46 24.17 0.11056 -2.08507 1 5


By default, the output data set includes the variable _PS_ (which provides the propensity score) and the variable _MATCHWGT_ (which provides matched observation weights). The weight for each treated unit is 1. Because K=1 is specified in the METHOD=OPTIMAL option in the MATCH statement, one control unit is matched to each treated unit, so the weight for each matched control unit is also 1. The LPS=_LPS option creates a variable named _LPS (which provides the logit of the propensity score) and the MATCHID=_MatchID option creates a variable named _MatchID (which identifies the matched sets of observations).

After the responses for the trial are observed, they can be added to the data set OutEx8 as the starting point for an outcome analysis. Assuming that no other confounding variables are associated with both the response variable and the treatment group indicator Drug, you can estimate the treatment effect from the matched observations by performing an outcome analysis that you would have used to estimate the treatment effect if the original data set had resulted from a randomized trial.

Last updated: December 09, 2022