The PSMATCH Procedure

MATCH Statement

  • MATCH <options>;

The MATCH statement matches observations in the control group to observations in the treatment group. The MATCH statement is not allowed if you specify a FREQ statement. The EWEIGHT, PSWEIGHT, and STRATA statements are ignored if you specify a MATCH statement.

Table 4 summarizes the options in the MATCH statement.

Table 4: MATCH Statement Options

Option Description
CALIPER= Specifies the caliper width requirement for matching
DISTANCE= Specifies the distance for comparing treated units and control units
EXACT= Requests exact matching for specified classification variables
METHOD= Specifies the method to use for matching
NMATCHMOST= Displays observations that have the greatest numbers of matches
WEIGHT= Specifies the type of weight for matched observations


The flowchart in Figure 13 displays the steps in the propensity score matching process.

Figure 13: Propensity Score Matching Options

Propensity Score Matching Options


You can specify the following options in the MATCH statement:

CALIPER <(caliper-options)> = r

specifies the caliper width requirement for matching, where r is either missing or greater than 0. The difference in propensity scores (or logits of propensity scores) between the treated unit and its matching control unit must be less than or equal to r. If you specify CALIPER=., then the caliper requirement is ignored. By default, CALIPER=0.25 (Rosenbaum and Rubin 1985, p. 37). Austin (2011a) has shown that CALIPER=0.20 is optimal in many settings.

You can use the following two caliper-options to prescribe the caliper requirement:

MULT=ONE | STDDEV

specifies the multiplier for the specified caliper width r.

ONE

uses r for the caliper width.

STDDEV

uses r times the pooled estimate of the standard deviation of the logit of the propensity score (if you specify DISTANCE=LPS) or the propensity score (if you specify DISTANCE=PS), where this estimate is computed as the square root of the average of the variances in the treated and control groups.

By default, MULT=STDDEV.

MAHDISTANCE=LPS | PS

specifies the type of distance to be used in the caliper width computation if you specify the DISTANCE=MAH option in the MATCH statement.

LPS

uses the logit of the propensity score.

PS

uses the propensity score scale.

By default, MAHDISTANCE=LPS.

If you specify the DISTANCE=LPS or DISTANCE=PS option in the MATCH statement, the specified type of distance is used in the caliper width computation.

DISTANCE=distance

specifies the type of distance to be compared when treated units are matched to control units. If you specify the DISTANCE=LPS or DISTANCE=PS option, the specified type of distance is also used in the caliper width computation. By default, DISTANCE=LPS. You can specify the following values for distance:

LPS

specifies matching that minimizes the difference between the logits of the propensity scores for the two units.

PS

specifies matching that minimizes the difference between the propensity scores for the two units.

MAH (var-options </ mah-options>)

specifies matching that minimizes the Mahalanobis distance between the two units.

You use the following var-options to select at least one variable for computing the Mahalanobis distance:

LPS

includes the logit of the propensity score.

PS

includes the propensity score.

VAR=(var-list)

includes variables in the specified var-list. These variables must be continuous variables in the input data set.

You can also specify the following mah-options:

COV=CONTROL | IDENTITY | POOLED

specifies the type of covariance matrix in the Mahalanobis distance.

CONTROL

uses the covariance matrix that is computed from observations in the control group.

IDENTITY

uses the identity matrix, and the resulting distance is the Euclidean distance.

POOLED

uses the pooled covariance matrix that is computed from observations in the treated group and observations in the control group.

By default, COV=CONTROL.

SQRT=YES | NO

specifies whether to apply the square root transformation to the Mahalanobis distance in the difference computation. This mah-option does not affect matching results for greedy nearest neighbor matching or matching with replacement. It affects only results for optimal matching that minimize the total absolute difference.

YES

uses the square root of the Mahalanobis distance as the difference between treated and control units.

NO

uses the Mahalanobis distance as the difference between treated and control units.

By default, SQRT=YES.

EXACT=variable  |  (variables)

specifies classification variables that are to be matched exactly. That is, observations in each matched set must have the same values for these variables. The variables must be specified in the CLASS statement.

METHOD=method <(method-options)>

specifies the method for the matching. You can specify the following methods and method-options. By default, METHOD=OPTIMAL.

METHOD=FULL (KMAX=kmax <full-options>)

requests optimal full matching. Each treated unit is matched with one or more control units, and each control unit (if matched) is matched with one or more treated units. If the specified total number of control units to be matched is less than the number of available control units, then constrained full matching is performed—that is, not all observations are matched.

You must specify the following suboption:

KMAX=kmax

specifies the maximum number of control units to be matched with each treated unit, where kmax greater-than-or-equal-to 1.

You can also specify the following full-options:

KMAXTREATED=kmaxtrt
KMAXTRT=kmaxtrt

specifies the maximum number of treated units for each control, where kmaxtrt greater-than-or-equal-to 1. By default, KMAXTREATED=2.

KMEAN=kmean

specifies the average number of control units to be matched with each treated unit. If the resulting number of control units is greater than the number of control units in the support region, the number of control units in the support region is used.

NCONTROL=m

specifies the number of control units to be matched. If m is greater than the number of control units in the support region, the number of control units in the support region is used.

PCTCONTROL=p

specifies the percentage of the total number of control units to be matched. If the resulting number of control units is greater than the total number of control units in the support region, the number of control units in the support region is used.

You can specify only one of the KMEAN=, NCONTROL=, and PCTCONTROL= options for the number of control units in the matched data set. If you do not specify any of the KMEAN=, NCONTROL=, and PCTCONTROL= options, KMEAN= (kmax + 1 / kmaxtrt) / 2 is used.

METHOD=GREEDY <(K=k ORDER=order-option)>

requests greedy nearest neighbor matching, in which each treated unit is sequentially matched with the k nearest control units. Matching depends on the ordering of the treated units, which you can specify in the ORDER= suboption.

You can specify the following suboptions:

K=k

specifies the number of matching control units, where k > 0, for each treated unit. PROC PSMATCH performs k separate loops of matching for treated units. In each loop, the nearest control unit is sequentially matched to each treated unit. By default, K=1 (one control unit for each treated unit).

ORDER=ASCENDING | DESCENDING | RANDOM <(SEED=number)>

specifies the ordering of treated units that are used to find the matching control units. You can specify one of the following values:

ASCENDING

orders the treated units in ascending order of the propensity score.

DESCENDING

orders the treated units in descending order of the propensity score.

RANDOM <(SEED=number)>

orders the treated units in random order of the propensity score. The SEED= suboption specifies a positive integer to start the pseudorandom number generator. If the SEED= option is not specified, the value is generated from reading the time of day from the computer’s clock.

By default, ORDER=DESCENDING.

METHOD=OPTIMAL <(K=k)>

requests optimal fixed ratio matching. The K=k suboption specifies the number of matching control units, where k > 0, for each treated unit. By default, K=1 (one control unit is matched with each treated unit).

METHOD=REPLACE <(K=k)>

requests a fixed number k of unique matching control units for each treated unit, where the matched control units are selected with replacement. This means that each control unit can be matched to more than one treated unit, but it can only be matched once to the same treated unit. The K=k suboption specifies the number of matching control units, where k > 0, for each treated unit. By default, K=1 (one control unit is matched with each treated unit).

METHOD=VARRATIO (KMAX=kmax <vr-options>)

requests optimal variable ratio matching. Each treated unit is matched with one or more control units.

You must specify the following suboption:

KMAX=kmax

specifies the maximum number of control units to be matched with each treated unit, where kmax greater-than-or-equal-to 1.

You can also specify the following vr-options:

KMEAN=kmean

specifies the average number of control units to be matched with each treated unit. If the resulting number of control units is greater than the total number of control units in the support region, the number of control units in the support region is used.

KMIN=kmin

specifies the minimum number of control units to be matched with each treated unit. By default, KMIN=1.

NCONTROL=m

specifies the total number of control units to be matched. If m is greater than the total number of control units in the support region, the number of control units in the support region is used.

PCTCONTROL=p

specifies the percentage of total control units to be matched. If the resulting number of control units is greater than the total number of control units in the support region, the number of control units in the support region is used.

You can specify only one of the KMEAN=, NCONTROL=, and PCTCONTROL= options for the number of control units in the matched data set. If you do not specify any of the KMEAN=, NCONTROL=, and PCTCONTROL= options, then KMEAN= (kmin + kmax) / 2 is used.

NMATCHMOST=n

displays a table of the observations that have the greatest numbers of matches, where n less-than-or-equal-to 50. This option displays observation numbers and numbers of matches for the n observations that have the greatest numbers of matches in the treated and control groups. If an ID statement is also specified, the corresponding values of the ID variables are also displayed identify the observations. The option is not applicable to greedy matching (METHOD=GREEDY) and optimal fixed ratio matching (METHOD=OPTIMAL), where a fixed number of control units are matched to each treated unit. By default, n = 0 and the table is not displayed.

WEIGHT=ATEWGT | ATTWGT | EQUAL | MATCHATEWGT | MATCHATTWGT | MATCHWGT | NONE

specifies the type of weight for matched observations.

ATEWGT | MATCHATEWGT

weights the treatment group up to the total size of the matched set. That is, in each matched set, the total weight of treated units equals the total number of units in the matched set, and the total weight of control units also equals the total number of units in the matched set. This weighting is available only for an optimal full matching (METHOD=FULL), and it is appropriate for estimating the ATE. For more information about using match weighting to estimate the ATE, see the section ATE Weighting after Full Matching.

ATTWGT | MATCHATTWGT | MATCHWGT

weights the control group up to the size of the treatment group in the matched set. That is, in each matched set, the total weight of control units equals the number of treated units in the matched set. This weighting is appropriate for estimating the ATT. For more information about using match weighting to estimate the ATT, see the sections ATT Weighting after Matching without Replacement and ATT Weighting after Matching with Replacement.

EQUAL | NONE

uses the same weight for matched observations. That is, each matched unit has a weight of 1, regardless of the treatment group.

By default, WEIGHT=ATTWGT.

Last updated: December 09, 2022