The VARMETHOD=BOOTSTRAP option requests the bootstrap method for variance estimation. This method can be used for stratified sample designs and for designs that have no stratification. If your design is stratified, the bootstrap method requires at least two PSUs in each stratum. You can provide bootstrap replicate weights for the analysis by using a REPWEIGHTS statement, or the procedure can construct bootstrap replicate weights for the analysis. PROC SURVEYFREQ estimates the parameter of interest (a proportion, total, odds ratio, or other statistic) from each replicate, and then uses the variability among replicate estimates to estimate the overall variance of the parameter estimate.
This bootstrap method for complex survey data is similar to the method of Rao, Wu, and Yue (1992) and is also known as the bootstrap weights method (Mashreghi, Haziza, and Léger 2016). For more information, see Lohr (2010, Section 9.3.3), Wolter (2007, Chapter 5), Beaumont and Patak (2012), Fuller (2009, Section 4.5), and Shao and Tu (1995, Section 6.2.4). McCarthy and Snowden (1985), Rao and Wu (1988), Sitter (1992b), and Sitter (1992a) provide several adjusted bootstrap variance estimators that are consistent for complex survey data. The naive bootstrap variance estimator that is suitable for infinite populations is not consistent for complex survey data.
If you do not provide replicate weights by using a REPWEIGHTS statement, PROC SURVEYFREQ constructs bootstrap replicate weights for the analysis. The procedure selects replicate bootstrap samples by with-replacement random sampling of PSUs within strata. You can specify the number of bootstrap replicates in the REPS= method-option; by default, the number of replicates is 250. (Increasing the number of replicates can improve the estimation precision but also increases the computation time.) You can specify the bootstrap sample sizes in the MH= method-option; by default,
, where
is the number of PSUs in stratum h.
In each replicate sample, the original sampling weights of the selected units are adjusted to reflect the full sample. These adjusted weights are the bootstrap replicate weights. In replicate r, the bootstrap replicate weight for observation j in PSU i in stratum h is computed as
where is the number of times PSU i is selected in replicate sample r, and
is the sampling fraction in stratum h.
You can use the OUTWEIGHTS= method-option to store the bootstrap replicate weights in a SAS data set. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set. You can provide these replicate weights to the procedure for subsequent analyses by using a REPWEIGHTS statement.
Let denote the population parameter to be estimated—for example, a proportion, total, odds ratio, or other statistic. Let
denote the estimate of
from the full sample, and let
denote the estimate from the rth bootstrap replicate, which is computed by using the bootstrap replicate weights. The bootstrap variance estimate for
is computed as
where R is the total number of replicates and is the coefficient for replicate r.
By default, for the bootstrap method. If you provide bootstrap weights in the REPWEIGHTS statement, you can also provide replicate coefficients in the REPCOEFS= option.
If you specify the CENTER=REPLICATES method-option, the bootstrap variance estimate is computed as
where is the average of the replicate estimates and is computed as follows:
If a parameter cannot be estimated from one or more replicates, the variance estimate is computed by using those replicates from which the parameter can be estimated. For example, suppose the parameter is a column proportion—the proportion of column j for table cell (i, j). If a replicate r contains no observations in column j, then the column j proportion is not estimable from replicate r. In this case, the bootstrap variance estimate is computed as
where the summation is over the replicates for which the parameter is estimable and where
is the number of those replicates.