The SURVEYFREQ Procedure

Bootstrap Method

The VARMETHOD=BOOTSTRAP option requests the bootstrap method for variance estimation. This method can be used for stratified sample designs and for designs that have no stratification. If your design is stratified, the bootstrap method requires at least two PSUs in each stratum. You can provide bootstrap replicate weights for the analysis by using a REPWEIGHTS statement, or the procedure can construct bootstrap replicate weights for the analysis. PROC SURVEYFREQ estimates the parameter of interest (a proportion, total, odds ratio, or other statistic) from each replicate, and then uses the variability among replicate estimates to estimate the overall variance of the parameter estimate.

This bootstrap method for complex survey data is similar to the method of Rao, Wu, and Yue (1992) and is also known as the bootstrap weights method (Mashreghi, Haziza, and Léger 2016). For more information, see Lohr (2010, Section 9.3.3), Wolter (2007, Chapter 5), Beaumont and Patak (2012), Fuller (2009, Section 4.5), and Shao and Tu (1995, Section 6.2.4). McCarthy and Snowden (1985), Rao and Wu (1988), Sitter (1992b), and Sitter (1992a) provide several adjusted bootstrap variance estimators that are consistent for complex survey data. The naive bootstrap variance estimator that is suitable for infinite populations is not consistent for complex survey data.

Replicate Weight Construction

If you do not provide replicate weights by using a REPWEIGHTS statement, PROC SURVEYFREQ constructs bootstrap replicate weights for the analysis. The procedure selects replicate bootstrap samples by with-replacement random sampling of PSUs within strata. You can specify the number of bootstrap replicates in the REPS= method-option; by default, the number of replicates is 250. (Increasing the number of replicates can improve the estimation precision but also increases the computation time.) You can specify the bootstrap sample sizes m Subscript h in the MH= method-option; by default, m Subscript h Baseline equals n Subscript h Baseline minus 1, where n Subscript h is the number of PSUs in stratum h.

In each replicate sample, the original sampling weights of the selected units are adjusted to reflect the full sample. These adjusted weights are the bootstrap replicate weights. In replicate r, the bootstrap replicate weight for observation j in PSU i in stratum h is computed as

w Subscript h i j Superscript left-parenthesis r right-parenthesis Baseline equals w Subscript h i j Baseline StartSet 1 minus StartRoot left-parenthesis 1 minus f Subscript h Baseline right-parenthesis m Subscript h Baseline slash left-parenthesis n Subscript h Baseline minus 1 right-parenthesis EndRoot plus StartRoot left-parenthesis 1 minus f Subscript h Baseline right-parenthesis m Subscript h Baseline slash left-parenthesis n Subscript h Baseline minus 1 right-parenthesis EndRoot left-parenthesis n Subscript h Baseline slash m Subscript h Baseline right-parenthesis k Subscript h i Superscript left-parenthesis r right-parenthesis Baseline EndSet

where k Subscript h i Superscript left-parenthesis r right-parenthesis is the number of times PSU i is selected in replicate sample r, and f Subscript h is the sampling fraction in stratum h.

You can use the OUTWEIGHTS= method-option to store the bootstrap replicate weights in a SAS data set. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set. You can provide these replicate weights to the procedure for subsequent analyses by using a REPWEIGHTS statement.

Variance Estimation

Let theta denote the population parameter to be estimated—for example, a proportion, total, odds ratio, or other statistic. Let ModifyingAbove theta With caret denote the estimate of theta from the full sample, and let ModifyingAbove theta With caret Subscript r denote the estimate from the rth bootstrap replicate, which is computed by using the bootstrap replicate weights. The bootstrap variance estimate for ModifyingAbove theta With caret is computed as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove theta With caret right-parenthesis equals sigma-summation Underscript r equals 1 Overscript upper R Endscripts alpha Subscript r Baseline left-parenthesis ModifyingAbove theta With caret Subscript r Baseline minus ModifyingAbove theta With caret right-parenthesis squared

where R is the total number of replicates and alpha Subscript r is the coefficient for replicate r.

By default, alpha Subscript r Baseline equals 1 slash upper R for the bootstrap method. If you provide bootstrap weights in the REPWEIGHTS statement, you can also provide replicate coefficients in the REPCOEFS= option.

If you specify the CENTER=REPLICATES method-option, the bootstrap variance estimate is computed as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove theta With caret right-parenthesis equals sigma-summation Underscript r equals 1 Overscript upper R Endscripts alpha Subscript r Baseline left-parenthesis ModifyingAbove theta With caret Subscript r Baseline minus theta overbar right-parenthesis squared

where theta overbar is the average of the replicate estimates and is computed as follows:

theta overbar equals StartFraction 1 Over upper R EndFraction sigma-summation Underscript r equals 1 Overscript upper R Endscripts ModifyingAbove theta With caret Subscript r

If a parameter cannot be estimated from one or more replicates, the variance estimate is computed by using those replicates from which the parameter can be estimated. For example, suppose the parameter is a column proportion—the proportion of column j for table cell (i, j). If a replicate r contains no observations in column j, then the column j proportion is not estimable from replicate r. In this case, the bootstrap variance estimate is computed as

ModifyingAbove upper V With caret left-parenthesis ModifyingAbove theta With caret right-parenthesis equals StartFraction upper R Over upper R prime EndFraction sigma-summation Underscript r equals 1 Overscript upper R Superscript prime Baseline Endscripts alpha Subscript r Baseline left-parenthesis ModifyingAbove theta With caret Subscript r Baseline minus ModifyingAbove theta With caret right-parenthesis squared

where the summation is over the replicates for which the parameter theta is estimable and where upper R prime is the number of those replicates.

Last updated: December 09, 2022