The SURVEYIMPUTE Procedure

Replication Variance Estimation

Replication methods are useful for estimating variances that account for both the sampling variability and the imputation variability. If you specify the METHOD=FEFI or the METHOD=FHDI option in the PROC SURVEYIMPUTE statement, then, by default, the procedure creates imputation-adjusted jackknife replicate weights unless you also specify the VARMETHOD=NONE option in the same statement. If you specify your own replicate weights by using the REPWEIGHTS statement and if you specify the METHOD=FEFI or the METHOD=FHDI option in the PROC SURVEYIMPUTE statement, then the procedure creates new replicate weights by adjusting the replicate weights that you provide for imputation. It does not create imputation-adjusted replicate weights when you specify the METHOD=HOTDECK option in the PROC SURVEYIMPUTE statement.

The SURVEYIMPUTE procedure does not compute any variances. The replicate weights that are created can be used in any SAS/STAT survey procedure for variance computation. For an example, see the section Getting Started: SURVEYIMPUTE Procedure.

Replication methods draw multiple replicates (also called subsamples) from a full sample according to a specific resampling scheme. The most commonly used resampling schemes are the balanced repeated replication (BRR) method, the jackknife method, and the bootstrap method. For each replicate, the original weights are modified for the primary sampling units (PSUs) in the replicates to create replicate weights. The parameters of interest are estimated by using the replicate weights for each replicate. These estimates are also known as replicate estimates. Then the variances of parameters of interest are estimated by estimating variability among the replicate estimates. The SURVEYIMPUTE procedure automatically creates replicate weights based on the replication method that you specify; alternatively you can use the REPWEIGHTS statement to provide your own replicate weights.

The following subsections provide details about how the replication weights are created for each variance estimation method.

Bootstrap Method

The naive bootstrap variance estimator that is suitable for infinite population is not consistent when applied to complex surveys. Bootstrap replicate samples for complex surveys are created by using a simple random sample with replacement of primary sampling units (PSUs) within each stratum. PSUs in different strata are sampled independently. The original sampling weights are then adjusted in each replicate to reflect the full sample. These adjusted weights are also called bootstrap replicate weights. McCarthy and Snowden (1985), Rao and Wu (1988), Sitter (1992b), and Sitter (1992a) provide several adjusted bootstrap variance estimators that are consistent for complex surveys. For more information about bootstrap variance estimation for complex surveys, see Mashreghi, Haziza, and Léger (2016), Beaumont and Patak (2012), Lohr (2010, Section 9.3.3), Fuller (2009, Section 4.5), Wolter (2007, Chapter 5), and Shao and Tu (1995, Section 6.2.4).

If you do not provide replicate weights by using the REPWEIGHTS statement, then the BOOTSTRAP option in the PROC SURVEYIMPUTE statement creates bootstrap replicate weights for you. This bootstrap method is similar to the method of Rao, Wu, and Yue (1992) and is also known as bootstrap weights method (Mashreghi, Haziza, and Léger 2016).

If you use the FEFI or the FHDI method, then the unadjusted bootstrap weights are adjusted for the imputation to create the imputation-adjusted replicate weights. The section Unadjusted Bootstrap Replicate Weights describes how the unadjusted replicate weights are created, and sections Imputation-Adjusted Replicate Weights and Replicate Weight Adjustments for FHDI describe how the imputation-adjusted replicate weights are created.

Unadjusted Bootstrap Replicate Weights

Each replicate is obtained by selecting a simple random sample with replacement of PSUs from stratum h. The rth bootstrap replicate weight for observation unit j in PSU i and stratum h is given by

w Subscript h i j Superscript left-parenthesis r right-parenthesis Baseline equals w Subscript h i j Baseline StartSet 1 minus StartRoot left-parenthesis 1 minus f Subscript h Baseline right-parenthesis m Subscript h Baseline slash left-parenthesis n Subscript h Baseline minus 1 right-parenthesis EndRoot plus StartRoot left-parenthesis 1 minus f Subscript h Baseline right-parenthesis m Subscript h Baseline slash left-parenthesis n Subscript h Baseline minus 1 right-parenthesis EndRoot left-parenthesis n Subscript h Baseline slash m Subscript h Baseline right-parenthesis k Subscript h i Superscript left-parenthesis r right-parenthesis Baseline EndSet

where is the number of times PSU i in stratum h is selected in replicate sample r, and is the sampling fraction in stratum h.

If you use the hot-deck imputation method, then you can use the OUTPUT statement to store the unadjusted replicate weights. The unadjusted replicate weights are not saved when you use the FEFI or FHDI method. You should use the imputation-adjusted replicate weights for variance estimation from a fractionally imputed data set.

For more information about how the bootstrap variance estimators are computed for related statistics, see the section "Bootstrap Method" in each of the following chapters: Chapter 118, The SURVEYFREQ Procedure, Chapter 120, The SURVEYLOGISTIC Procedure, Chapter 121, The SURVEYMEANS Procedure, Chapter 122, The SURVEYPHREG Procedure, and Chapter 123, The SURVEYREG Procedure.

Balanced Repeated Replication (BRR) Method

The balanced repeated replication (BRR) method requires that the full sample be drawn by using a stratified sample design with two primary sampling units (PSUs) per stratum. The BRR method constructs half-sample replicates by deleting one PSU per stratum according to a Hadamard matrix and doubling the original weight of the other PSU in that stratum. If you use the FEFI or the FHDI method, then the unadjusted BRR weights are adjusted for the imputation to create the imputation-adjusted replicate weights. The sections Unadjusted BRR Replicate Weights and Unadjusted Fay’s BRR Replicate Weights describe how the unadjusted replicate weights are created, and the section Imputation-Adjusted Replicate Weights describes how the imputation-adjusted replicate weights are created.

Unadjusted BRR Replicate Weights

Let H be the total number of strata. The total number of replicates, R, is the smallest multiple of 4 that is greater than H. However, if you prefer a larger number of replicates, you can specify the REPS=n method-option. If an Hadamard matrix cannot be constructed, the number of replicates is increased until a Hadamard matrix becomes available.

Each replicate is obtained by deleting one PSU per stratum according to a corresponding Hadamard matrix and adjusting the original weights for the remaining PSUs. The new weights are called replicate weights.

Replicates are constructed by using the first H columns of the Hadamard matrix. The rth () replicate is drawn from the full sample according to the rth row of the Hadamard matrix as follows:

If the element of the Hadamard matrix is 1, then the first PSU of stratum h is included in the rth replicate and the second PSU of stratum h is excluded.
If the element of the Hadamard matrix is –1, then the second PSU of stratum h is included in the rth replicate and the first PSU of stratum h is excluded.

The replicate weights of the remaining PSUs in each half sample are then doubled to their original weights. For more information about the BRR method, see Wolter (2007) and Lohr (2010).

By default, PROC SURVEYIMPUTE generates an appropriate Hadamard matrix automatically to create the replicates. You can display the Hadamard matrix by specifying the VARMETHOD=BRR(PRINTH) method-option. If you provide a Hadamard matrix by specifying the VARMETHOD=BRR(HADAMARD=) method-option, then the replicates are generated according to the provided Hadamard matrix.

For more information about how the BRR variance estimators are computed for related statistics, see the section "Balanced Repeated Replication (BRR) Method" in each of the following chapters: Chapter 118, The SURVEYFREQ Procedure, Chapter 120, The SURVEYLOGISTIC Procedure, Chapter 121, The SURVEYMEANS Procedure, Chapter 122, The SURVEYPHREG Procedure, and Chapter 123, The SURVEYREG Procedure.

Unadjusted Fay’s BRR Replicate Weights

The traditional BRR method constructs half-sample replicates by deleting one PSU per stratum according to a Hadamard matrix and doubling the original weight of the other PSU. Fay’s BRR method uses the Fay coefficient, , and instead of deleting one PSU per stratum, it multiplies the original weight by the coefficient . The original weight of the remaining PSU in that stratum is multiplied by . PROC SURVEYIMPUTE uses as the default value; alternatively, you can specify a value for by using the FAY= method-option. When , Fay’s method becomes the traditional BRR method. For more information, see Dippo, Fay, and Morganstein (1984); Fay (1984, 1989); Judkins (1990). Because the traditional BRR method uses only half of the total sample in every replicate, some observed levels of the analysis variables might not be available in the replicate samples. Fay’s BRR method is especially useful in this situation because it uses all the sampled units in every replicate.

For more information about how Fay’s BRR variance estimators are computed for related statistics, see the section "Balanced Repeated Replication (BRR) Method" in each of the following chapters: Chapter 118, The SURVEYFREQ Procedure, Chapter 120, The SURVEYLOGISTIC Procedure, Chapter 121, The SURVEYMEANS Procedure, Chapter 122, The SURVEYPHREG Procedure, and Chapter 123, The SURVEYREG Procedure.

Hadamard Matrix

PROC SURVEYIMPUTE uses a Hadamard matrix to construct replicates for BRR variance estimation. You can provide a Hadamard matrix for replicate construction by using the HADAMARD= method-option for VARMETHOD=BRR. Otherwise, PROC SURVEYIMPUTE generates an appropriate Hadamard matrix. You can display the Hadamard matrix by specifying the PRINTH method-option.

A Hadamard matrix of dimension R is a square matrix that has all elements equal to 1 or –1 such that , where is an identity matrix of appropriate order. The dimension of a Hadamard matrix must equal 1, 2, or a multiple of 4.

For example, the following matrix is a Hadamard matrix of dimension k = 8:

StartLayout 1st Row 1st Column 1 2nd Column 1 3rd Column 1 4th Column 1 5th Column 1 6th Column 1 7th Column 1 8th Column 1 2nd Row 1st Column 1 2nd Column negative 1 3rd Column 1 4th Column negative 1 5th Column 1 6th Column negative 1 7th Column 1 8th Column negative 1 3rd Row 1st Column 1 2nd Column 1 3rd Column negative 1 4th Column negative 1 5th Column 1 6th Column 1 7th Column negative 1 8th Column negative 1 4th Row 1st Column 1 2nd Column negative 1 3rd Column negative 1 4th Column 1 5th Column 1 6th Column negative 1 7th Column negative 1 8th Column 1 5th Row 1st Column 1 2nd Column 1 3rd Column 1 4th Column 1 5th Column negative 1 6th Column negative 1 7th Column negative 1 8th Column negative 1 6th Row 1st Column 1 2nd Column negative 1 3rd Column 1 4th Column negative 1 5th Column negative 1 6th Column 1 7th Column negative 1 8th Column 1 7th Row 1st Column 1 2nd Column 1 3rd Column negative 1 4th Column negative 1 5th Column negative 1 6th Column negative 1 7th Column 1 8th Column 1 8th Row 1st Column 1 2nd Column negative 1 3rd Column negative 1 4th Column 1 5th Column negative 1 6th Column 1 7th Column 1 8th Column negative 1 EndLayout

For BRR replicate construction, the dimension of the Hadamard matrix must be at least H, where H denotes the number of first-stage strata in your design. If a Hadamard matrix of a particular dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS data set in the HADAMARD= method-option. You must ensure that the matrix that you provide is actually a Hadamard matrix; PROC SURVEYIMPUTE does not check the validity of your Hadamard matrix.

For more information about how the Hadamard matrix is used to construct replicates for BRR variance estimation, see the section Unadjusted BRR Replicate Weights.

Jackknife Method

The jackknife method of variance estimation deletes one PSU at a time from the full sample to create replicates. This method is also known as the delete-1 jackknife method because it deletes exactly one PSU in every replicate. The total number of replicates R is the same as the total number of PSUs. In each replicate, the sampling weights of the remaining PSUs are modified by the jackknife coefficient . The modified weights are called replicate weights. If you use the FEFI or the FHDI method, then the unadjusted replicate weights are adjusted for the imputation to create the imputation-adjusted replicate weights. The section Unadjusted Jackknife Replicate Weights describes how the unadjusted replicate weights are created, and the section Imputation-Adjusted Replicate Weights describes how the imputation-adjusted replicate weights are created.

Unadjusted Jackknife Replicate Weights

Let PSU i in stratum be omitted for the rth replicate. Then the jackknife coefficient, , and replicate weights, , are computed as

alpha Subscript r Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column StartFraction n Subscript h Sub Subscript r Subscript Baseline minus 1 Over n Subscript h Sub Subscript r Subscript Baseline EndFraction 2nd Column for a stratified design 2nd Row 1st Column StartFraction upper R minus 1 Over upper R EndFraction 2nd Column for designs without stratification EndLayout

w Subscript h i j Superscript left-parenthesis r right-parenthesis Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column w Subscript h i j Baseline 2nd Column if observation unit j is not in donor stratum h Subscript r Baseline 2nd Row 1st Column 0 2nd Column if observation unit j is in PSU i of donor stratum h Subscript r Baseline 3rd Row 1st Column w Subscript h i j Baseline slash alpha Subscript r Baseline 2nd Column if observation unit j is not in PSU i but is in donor stratum h Subscript r Baseline EndLayout

If you use the hot-deck imputation method, then you can use the OUTPUT statement in PROC SURVEYIMPUTE to store the unadjusted replicate weights. The unadjusted replicate weights are not saved for the FEFI or the FHDI method. You should use the imputation-adjusted replicate weights for variance estimation from a fractionally imputed data set. Use the OUTJKCOEFS= option in the OUTPUT statement to store the jackknife coefficients in a SAS data set.

For more information about how the jackknife variance estimators are computed for related statistics, see the section "Jackknife Method" in each of the following chapters: Chapter 118, The SURVEYFREQ Procedure, Chapter 120, The SURVEYLOGISTIC Procedure, Chapter 121, The SURVEYMEANS Procedure, Chapter 122, The SURVEYPHREG Procedure, and Chapter 123, The SURVEYREG Procedure.

Imputation-Adjusted Replicate Weights

If you use the hot-deck imputation technique by specifying the METHOD=HOTDECK option in the PROC SURVEYIMPUTE statement, the procedure does not create imputation-adjusted replicate weights. Naive variance estimators that do not use imputation-adjusted replicate weights and assume the imputed data as the observed data might underestimate the true variance. For more information, see Haziza (2009); Särndal and Lundström (2005); Rao and Shao (1992).

If specify the METHOD=FEFI or the METHOD=FHDI option in the PROC SURVEYIMPUTE statement, the procedure adjusts the replicate weights for imputation. The imputation-adjusted replicate weights should be used with other SAS/STAT survey procedures to estimate the variance of an estimator that uses the imputed data. For more information, see Fuller (2009, Section 5.2.2) and Kim and Shao (2014, Section 4.6).

Let be the unadjusted replicate weight for observation unit i. To facilitate discussion, separate subscripts for strata, clusters, and imputation cells are omitted. The unadjusted replicate weights can come from a jackknife method as described in the section Unadjusted Jackknife Replicate Weights, from a BRR method as described in the section Unadjusted BRR Replicate Weights, or from a bootstrap method as described in the section Unadjusted Bootstrap Replicate Weights, or they can be specified by using the REPWEIGHTS statement. The adjustment follows the similar EM-by-weighting algorithm that is described in the section Fully Efficient Fractional Imputation but uses the replicate weights, , instead of the full sample weight, .

In particular, the joint probabilities for the tth M-step and the rth replicate weight are computed by

ModifyingAbove pi With tilde Subscript left-parenthesis t right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis kappa 1 midline-horizontal-ellipsis kappa Subscript upper P Baseline right-parenthesis equals StartSet sigma-summation Underscript i Endscripts sigma-summation Underscript l Endscripts w Subscript i Superscript left-parenthesis r right-parenthesis Baseline w Subscript i l left-parenthesis t minus 1 right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline EndSet Superscript negative 1 Baseline sigma-summation Underscript i Endscripts sigma-summation Underscript l Endscripts w Subscript i Superscript left-parenthesis r right-parenthesis Baseline w Subscript i l left-parenthesis t minus 1 right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline upper I left-parenthesis upper Z Subscript i Baseline 1 Baseline equals kappa 1 comma ellipsis comma upper Z Subscript i upper P Baseline equals kappa Subscript upper P Baseline right-parenthesis

for all i, l, and .

The rth replicate fractional weights for the tth E-step is computed by

w Subscript i l left-parenthesis t right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline equals StartSet sigma-summation Underscript k equals 1 Overscript upper M Subscript l Baseline Endscripts ModifyingAbove pi With tilde Subscript left-parenthesis t right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis bold upper Z Subscript i comma normal o normal b normal s Baseline comma bold upper Z Subscript i comma normal m normal i normal s normal s left-bracket k right-bracket Baseline right-parenthesis EndSet Superscript negative 1 Baseline ModifyingAbove pi With tilde Subscript left-parenthesis t right-parenthesis Superscript left-parenthesis r right-parenthesis Baseline left-parenthesis bold upper Z Subscript i comma normal o normal b normal s Baseline comma bold upper Z Subscript i comma normal m normal i normal s normal s left-bracket l right-bracket Baseline right-parenthesis

where is the number of donor cells.

Replicate Weight Adjustments for FHDI

If you use the FHDI method (by specifying the METHOD=FHDI option in the PROC SURVEYIMPUTE statement), the procedure adjusts the replicate weights for imputation. You must use the imputation-adjusted replicate weights with other SAS/STAT survey procedures to estimate the variance of an estimator that uses the imputed data.

Let be the number of second-stage donor cells that are requested for FHDI (the value of the NDONORS= option in the PROC SURVEYIMPUTE statement), and let be the unique number of selected second-stage donor cells for observation unit i. Further assume that is the number of times the second-stage donor cell is selected for first-stage donor cell .

Let be the unadjusted replicate weight, be the first-stage fractional replicate weight, and be the second-stage replicate weight from two-stage FEFI for observation unit i, first-stage donor cell , second-stage donor cell , and replicate sample r. Let be the total number of second-stage donor cells conditional on the first-stage donor cell for observation unit i.

The following weight adjustments are available:

No adjustment: If you specify the REPWTADJ=NONE option, then PROC SURVEYIMPUTE does not adjust the replicate weights for FHDI.

The two-stage imputation-adjusted fractional replicate weight for the rth replicate sample for the second-stage donor cell in the first-stage donor cell for observation unit i is

where min(, ).

The two-stage imputation-adjusted replicate weight for observation unit i, first-stage donor cell , and second-stage donor cell in the rth replicate sample is .
Ratio adjustment: If you specify the REPWTADJ=RATIO option, then PROC SURVEYIMPUTE adjusts the replicate weights for FHDI by using ratio adjustment.

The two-stage imputation-adjusted fractional replicate weight for the rth replicate sample for the second-stage donor cell in the first-stage donor cell for observation unit i is

where min(, ) and is the sum of the ratios of the second-stage replicate fractional weight to the second-stage full sample fractional weight for observation unit i.

If for all selected second-stage donor cells in the rth replicate sample, then each selected second-stage donor cell is assigned a second-stage fractional weight of .

The two-stage imputation-adjusted replicate weight for observation unit i, first-stage donor cell , and second-stage donor cell in the rth replicate sample is .
Neighbor adjustment: If you specify the REPWTADJ=NEIGHBOR option or if you do not specify the REPWTADJ= option, then PROC SURVEYIMPUTE adjusts the replicate weights for FHDI by using neighbor adjustment.

Neighbor adjustment first computes the proportion of the full-sample fractional weights that fall in each of equally spaced intervals and then adjusts the replicate sample fractional weights by using the proportions from the full sample. Let be the cumulative sum of the second-stage full-sample fractional weights for the first k second-stage donor cells. depends on both i and but, for simplicity, subscripts i and are not used in . By construction, . Define factors according to how the interval for the kth second-stage donor cell overlaps with the dth equally spaced interval :

for and . The replicate fraction for the second-stage donor cell is computed as

for all , and . Let .

The two-stage imputation-adjusted fractional replicate weight for the rth replicate sample for the second-stage donor cell in the first-stage donor cell for observation unit i is

where min(, ).

If for all selected second-stage donor cells in the rth replicate sample, then each selected second-stage donor cell is assigned a second-stage fractional weight of , where is the number of times the second-stage donor cell is selected.

The two-stage imputation-adjusted replicate weight for observation unit i, first-stage donor cell , and second-stage donor cell in the rth replicate sample is .

Last updated: December 09, 2022