Replication methods are useful for estimating variances that account for both the sampling variability and the imputation variability. If you specify the METHOD=FEFI or the METHOD=FHDI option in the PROC SURVEYIMPUTE statement, then, by default, the procedure creates imputation-adjusted jackknife replicate weights unless you also specify the VARMETHOD=NONE option in the same statement. If you specify your own replicate weights by using the REPWEIGHTS statement and if you specify the METHOD=FEFI or the METHOD=FHDI option in the PROC SURVEYIMPUTE statement, then the procedure creates new replicate weights by adjusting the replicate weights that you provide for imputation. It does not create imputation-adjusted replicate weights when you specify the METHOD=HOTDECK option in the PROC SURVEYIMPUTE statement.
The SURVEYIMPUTE procedure does not compute any variances. The replicate weights that are created can be used in any SAS/STAT survey procedure for variance computation. For an example, see the section Getting Started: SURVEYIMPUTE Procedure.
Replication methods draw multiple replicates (also called subsamples) from a full sample according to a specific resampling scheme. The most commonly used resampling schemes are the balanced repeated replication (BRR) method, the jackknife method, and the bootstrap method. For each replicate, the original weights are modified for the primary sampling units (PSUs) in the replicates to create replicate weights. The parameters of interest are estimated by using the replicate weights for each replicate. These estimates are also known as replicate estimates. Then the variances of parameters of interest are estimated by estimating variability among the replicate estimates. The SURVEYIMPUTE procedure automatically creates replicate weights based on the replication method that you specify; alternatively you can use the REPWEIGHTS statement to provide your own replicate weights.
The following subsections provide details about how the replication weights are created for each variance estimation method.
The naive bootstrap variance estimator that is suitable for infinite population is not consistent when applied to complex surveys. Bootstrap replicate samples for complex surveys are created by using a simple random sample with replacement of primary sampling units (PSUs) within each stratum. PSUs in different strata are sampled independently. The original sampling weights are then adjusted in each replicate to reflect the full sample. These adjusted weights are also called bootstrap replicate weights. McCarthy and Snowden (1985), Rao and Wu (1988), Sitter (1992b), and Sitter (1992a) provide several adjusted bootstrap variance estimators that are consistent for complex surveys. For more information about bootstrap variance estimation for complex surveys, see Mashreghi, Haziza, and Léger (2016), Beaumont and Patak (2012), Lohr (2010, Section 9.3.3), Fuller (2009, Section 4.5), Wolter (2007, Chapter 5), and Shao and Tu (1995, Section 6.2.4).
If you do not provide replicate weights by using the REPWEIGHTS statement, then the BOOTSTRAP option in the PROC SURVEYIMPUTE statement creates bootstrap replicate weights for you. This bootstrap method is similar to the method of Rao, Wu, and Yue (1992) and is also known as bootstrap weights method (Mashreghi, Haziza, and Léger 2016).
If you use the FEFI or the FHDI method, then the unadjusted bootstrap weights are adjusted for the imputation to create the imputation-adjusted replicate weights. The section Unadjusted Bootstrap Replicate Weights describes how the unadjusted replicate weights are created, and sections Imputation-Adjusted Replicate Weights and Replicate Weight Adjustments for FHDI describe how the imputation-adjusted replicate weights are created.
Each replicate is obtained by selecting a simple random sample with replacement of PSUs from stratum h. The rth bootstrap replicate weight for observation unit j in PSU i and stratum h is given by
where is the number of times PSU i in stratum h is selected in replicate sample r, and
is the sampling fraction in stratum h.
If you use the hot-deck imputation method, then you can use the OUTPUT statement to store the unadjusted replicate weights. The unadjusted replicate weights are not saved when you use the FEFI or FHDI method. You should use the imputation-adjusted replicate weights for variance estimation from a fractionally imputed data set.
For more information about how the bootstrap variance estimators are computed for related statistics, see the section "Bootstrap Method" in each of the following chapters: Chapter 118, The SURVEYFREQ Procedure, Chapter 120, The SURVEYLOGISTIC Procedure, Chapter 121, The SURVEYMEANS Procedure, Chapter 122, The SURVEYPHREG Procedure, and Chapter 123, The SURVEYREG Procedure.
The balanced repeated replication (BRR) method requires that the full sample be drawn by using a stratified sample design with two primary sampling units (PSUs) per stratum. The BRR method constructs half-sample replicates by deleting one PSU per stratum according to a Hadamard matrix and doubling the original weight of the other PSU in that stratum. If you use the FEFI or the FHDI method, then the unadjusted BRR weights are adjusted for the imputation to create the imputation-adjusted replicate weights. The sections Unadjusted BRR Replicate Weights and Unadjusted Fay’s BRR Replicate Weights describe how the unadjusted replicate weights are created, and the section Imputation-Adjusted Replicate Weights describes how the imputation-adjusted replicate weights are created.
Let H be the total number of strata. The total number of replicates, R, is the smallest multiple of 4 that is greater than H. However, if you prefer a larger number of replicates, you can specify the REPS=n method-option. If an Hadamard matrix cannot be constructed, the number of replicates is increased until a Hadamard matrix becomes available.
Each replicate is obtained by deleting one PSU per stratum according to a corresponding Hadamard matrix and adjusting the original weights for the remaining PSUs. The new weights are called replicate weights.
Replicates are constructed by using the first H columns of the Hadamard matrix. The rth (
) replicate is drawn from the full sample according to the rth row of the Hadamard matrix as follows:
If the element of the Hadamard matrix is 1, then the first PSU of stratum h is included in the rth replicate and the second PSU of stratum h is excluded.
If the element of the Hadamard matrix is –1, then the second PSU of stratum h is included in the rth replicate and the first PSU of stratum h is excluded.
The replicate weights of the remaining PSUs in each half sample are then doubled to their original weights. For more information about the BRR method, see Wolter (2007) and Lohr (2010).
By default, PROC SURVEYIMPUTE generates an appropriate Hadamard matrix automatically to create the replicates. You can display the Hadamard matrix by specifying the VARMETHOD=BRR(PRINTH) method-option. If you provide a Hadamard matrix by specifying the VARMETHOD=BRR(HADAMARD=) method-option, then the replicates are generated according to the provided Hadamard matrix.
For more information about how the BRR variance estimators are computed for related statistics, see the section "Balanced Repeated Replication (BRR) Method" in each of the following chapters: Chapter 118, The SURVEYFREQ Procedure, Chapter 120, The SURVEYLOGISTIC Procedure, Chapter 121, The SURVEYMEANS Procedure, Chapter 122, The SURVEYPHREG Procedure, and Chapter 123, The SURVEYREG Procedure.
The traditional BRR method constructs half-sample replicates by deleting one PSU per stratum according to a Hadamard matrix and doubling the original weight of the other PSU. Fay’s BRR method uses the Fay coefficient,
, and instead of deleting one PSU per stratum, it multiplies the original weight by the coefficient
. The original weight of the remaining PSU in that stratum is multiplied by
. PROC SURVEYIMPUTE uses
as the default value; alternatively, you can specify a value for
by using the FAY= method-option. When
, Fay’s method becomes the traditional BRR method. For more information, see Dippo, Fay, and Morganstein (1984); Fay (1984, 1989); Judkins (1990). Because the traditional BRR method uses only half of the total sample in every replicate, some observed levels of the analysis variables might not be available in the replicate samples. Fay’s BRR method is especially useful in this situation because it uses all the sampled units in every replicate.
For more information about how Fay’s BRR variance estimators are computed for related statistics, see the section "Balanced Repeated Replication (BRR) Method" in each of the following chapters: Chapter 118, The SURVEYFREQ Procedure, Chapter 120, The SURVEYLOGISTIC Procedure, Chapter 121, The SURVEYMEANS Procedure, Chapter 122, The SURVEYPHREG Procedure, and Chapter 123, The SURVEYREG Procedure.
PROC SURVEYIMPUTE uses a Hadamard matrix to construct replicates for BRR variance estimation. You can provide a Hadamard matrix for replicate construction by using the HADAMARD= method-option for VARMETHOD=BRR. Otherwise, PROC SURVEYIMPUTE generates an appropriate Hadamard matrix. You can display the Hadamard matrix by specifying the PRINTH method-option.
A Hadamard matrix of dimension R is a square matrix that has all elements equal to 1 or –1 such that
, where
is an identity matrix of appropriate order. The dimension of a Hadamard matrix must equal 1, 2, or a multiple of 4.
For example, the following matrix is a Hadamard matrix of dimension k = 8:
For BRR replicate construction, the dimension of the Hadamard matrix must be at least H, where H denotes the number of first-stage strata in your design. If a Hadamard matrix of a particular dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS data set in the HADAMARD= method-option. You must ensure that the matrix that you provide is actually a Hadamard matrix; PROC SURVEYIMPUTE does not check the validity of your Hadamard matrix.
For more information about how the Hadamard matrix is used to construct replicates for BRR variance estimation, see the section Unadjusted BRR Replicate Weights.
The jackknife method of variance estimation deletes one PSU at a time from the full sample to create replicates. This method is also known as the delete-1 jackknife method because it deletes exactly one PSU in every replicate. The total number of replicates R is the same as the total number of PSUs. In each replicate, the sampling weights of the remaining PSUs are modified by the jackknife coefficient . The modified weights are called replicate weights. If you use the FEFI or the FHDI method, then the unadjusted replicate weights are adjusted for the imputation to create the imputation-adjusted replicate weights. The section Unadjusted Jackknife Replicate Weights describes how the unadjusted replicate weights are created, and the section Imputation-Adjusted Replicate Weights describes how the imputation-adjusted replicate weights are created.
Let PSU i in stratum be omitted for the rth replicate. Then the jackknife coefficient,
, and replicate weights,
, are computed as
If you use the hot-deck imputation method, then you can use the OUTPUT statement in PROC SURVEYIMPUTE to store the unadjusted replicate weights. The unadjusted replicate weights are not saved for the FEFI or the FHDI method. You should use the imputation-adjusted replicate weights for variance estimation from a fractionally imputed data set. Use the OUTJKCOEFS= option in the OUTPUT statement to store the jackknife coefficients in a SAS data set.
For more information about how the jackknife variance estimators are computed for related statistics, see the section "Jackknife Method" in each of the following chapters: Chapter 118, The SURVEYFREQ Procedure, Chapter 120, The SURVEYLOGISTIC Procedure, Chapter 121, The SURVEYMEANS Procedure, Chapter 122, The SURVEYPHREG Procedure, and Chapter 123, The SURVEYREG Procedure.
If you use the hot-deck imputation technique by specifying the METHOD=HOTDECK option in the PROC SURVEYIMPUTE statement, the procedure does not create imputation-adjusted replicate weights. Naive variance estimators that do not use imputation-adjusted replicate weights and assume the imputed data as the observed data might underestimate the true variance. For more information, see Haziza (2009); Särndal and Lundström (2005); Rao and Shao (1992).
If specify the METHOD=FEFI or the METHOD=FHDI option in the PROC SURVEYIMPUTE statement, the procedure adjusts the replicate weights for imputation. The imputation-adjusted replicate weights should be used with other SAS/STAT survey procedures to estimate the variance of an estimator that uses the imputed data. For more information, see Fuller (2009, Section 5.2.2) and Kim and Shao (2014, Section 4.6).
Let be the unadjusted replicate weight for observation unit i. To facilitate discussion, separate subscripts for strata, clusters, and imputation cells are omitted. The unadjusted replicate weights can come from a jackknife method as described in the section Unadjusted Jackknife Replicate Weights, from a BRR method as described in the section Unadjusted BRR Replicate Weights, or from a bootstrap method as described in the section Unadjusted Bootstrap Replicate Weights, or they can be specified by using the REPWEIGHTS statement. The adjustment follows the similar EM-by-weighting algorithm that is described in the section Fully Efficient Fractional Imputation but uses the replicate weights,
, instead of the full sample weight,
.
In particular, the joint probabilities for the tth M-step and the rth replicate weight are computed by
The rth replicate fractional weights for the tth E-step is computed by
where is the number of donor cells.
If you use the FHDI method (by specifying the METHOD=FHDI option in the PROC SURVEYIMPUTE statement), the procedure adjusts the replicate weights for imputation. You must use the imputation-adjusted replicate weights with other SAS/STAT survey procedures to estimate the variance of an estimator that uses the imputed data.
Let be the number of second-stage donor cells that are requested for FHDI (the value of the NDONORS= option in the PROC SURVEYIMPUTE statement), and let
be the unique number of selected second-stage donor cells for observation unit i. Further assume that
is the number of times the second-stage donor cell
is selected for first-stage donor cell
.
Let be the unadjusted replicate weight,
be the first-stage fractional replicate weight, and
be the second-stage replicate weight from two-stage FEFI for observation unit i, first-stage donor cell
, second-stage donor cell
, and replicate sample r. Let
be the total number of second-stage donor cells conditional on the first-stage donor cell
for observation unit i.
The following weight adjustments are available:
No adjustment: If you specify the REPWTADJ=NONE option, then PROC SURVEYIMPUTE does not adjust the replicate weights for FHDI.
The two-stage imputation-adjusted fractional replicate weight for the rth replicate sample for the second-stage donor cell in the first-stage donor cell
for observation unit i is
The two-stage imputation-adjusted replicate weight for observation unit i, first-stage donor cell , and second-stage donor cell
in the rth replicate sample is
.
Ratio adjustment: If you specify the REPWTADJ=RATIO option, then PROC SURVEYIMPUTE adjusts the replicate weights for FHDI by using ratio adjustment.
The two-stage imputation-adjusted fractional replicate weight for the rth replicate sample for the second-stage donor cell in the first-stage donor cell
for observation unit i is
where min(
,
) and
is the sum of the ratios of the second-stage replicate fractional weight to the second-stage full sample fractional weight for observation unit i.
If for all selected second-stage donor cells in the rth replicate sample, then each selected second-stage donor cell is assigned a second-stage fractional weight of
.
The two-stage imputation-adjusted replicate weight for observation unit i, first-stage donor cell , and second-stage donor cell
in the rth replicate sample is
.
Neighbor adjustment: If you specify the REPWTADJ=NEIGHBOR option or if you do not specify the REPWTADJ= option, then PROC SURVEYIMPUTE adjusts the replicate weights for FHDI by using neighbor adjustment.
Neighbor adjustment first computes the proportion of the full-sample fractional weights that fall in each of equally spaced intervals and then adjusts the replicate sample fractional weights by using the proportions from the full sample. Let
be the cumulative sum of the second-stage full-sample fractional weights for the first k second-stage donor cells.
depends on both i and
but, for simplicity, subscripts i and
are not used in
. By construction,
. Define factors
according to how the interval
for the kth second-stage donor cell overlaps with the dth equally spaced interval
:
for and
. The replicate fraction for the second-stage donor cell
is computed as
The two-stage imputation-adjusted fractional replicate weight for the rth replicate sample for the second-stage donor cell in the first-stage donor cell
for observation unit i is
If for all selected second-stage donor cells in the rth replicate sample, then each selected second-stage donor cell is assigned a second-stage fractional weight of
, where
is the number of times the second-stage donor cell
is selected.
The two-stage imputation-adjusted replicate weight for observation unit i, first-stage donor cell , and second-stage donor cell
in the rth replicate sample is
.