PROC SURVEYIMPUTE produces replicate weights that are based on the sample design that is used to collect the survey data. You can use PROC SURVEYIMPUTE for single-stage or multistage designs, with or without stratification, and with or without unequal weighting. To create imputation-adjusted replicate weights for your survey data, you need to provide sample design information to PROC SURVEYIMPUTE. This information can include design (or variance) strata, clusters, and sampling weights. You provide sample design information by specifying the STRATA, CLUSTER, WEIGHT, and REPWEIGHTS statements and the RATE= or TOTAL= option in the PROC SURVEYIMPUTE statement.
If you use the REPWEIGHTS statement to provide replicate weights, you do not need to use a STRATA or CLUSTER statement. Otherwise, you should use STRATA and CLUSTER statements whenever your design includes stratification and clustering. If your design includes unequal sampling weights, you should use the WEIGHT statement.
For a multistage sample design, PROC SURVEYIMPUTE uses only the first stage of the sample design to create replicate weights. Therefore, the required input includes only the first-stage cluster (PSU) identification and first-stage stratum identification. You do not need to input design information about any additional stages of sampling.
If your sample design is stratified at the first stage of sampling, use the STRATA statement to name the variables that form the strata. The combinations of categories of STRATA variables define the strata in the sample, where strata are nonoverlapping subgroups that were sampled independently. If your sample design has stratification at multiple stages, then identify only the first-stage strata in the STRATA statement.
If you use a REPWEIGHTS statement to provide replicate weights, you do not need to use a STRATA statement. Otherwise, you should use a STRATA statement whenever your design includes stratification. If you do not use a STRATA statement or a REPWEIGHTS statement, then PROC SURVEYIMPUTE assumes there is no stratification at the first stage; that is, the procedure assumes that all observation units are in the same stratum.
If your sample design selects clusters at the first stage of sampling, use the CLUSTER statement to name the variables that identify the first-stage clusters, which are also called primary sampling units (PSUs). The combinations of categories of CLUSTER variables define the clusters in the sample. If there is a STRATA statement, clusters are nested within strata. If your sample design has clustering at multiple stages, you should specify only the first-stage clusters (PSUs) in the CLUSTER statement. PROC SURVEYIMPUTE assumes that each cluster that is defined by the variables in the CLUSTER statement represents a PSU in the sample.
If you use a REPWEIGHTS statement to provide replicate weights, you do not need to use a CLUSTER statement. Otherwise, you should use a CLUSTER statement whenever your design includes clustering at the first stage of sampling. If you do not use a CLUSTER statement, then PROC SURVEYIMPUTE treats each observation as a PSU.
If your sample design includes unequal weighting, use the WEIGHT statement to name the variable that contains the sampling weights. Sampling weights must be positive numbers. If an observation has a weight that is nonpositive or missing, then PROC SURVEYIMPUTE omits that observation from the analysis. For more information, see the section Missing Values.
If you do not use a WEIGHT statement but you include a REPWEIGHTS statement, PROC SURVEYIMPUTE uses the average of each observation’s replicate weights as the observation’s weight. If you use neither a WEIGHT statement nor a REPWEIGHTS statement, PROC SURVEYIMPUTE assumes that all observations have a weight of 1.
To include a finite population correction (fpc) in the bootstrap replicate weights, you can specify either the sampling rate or the population total by using the RATE= or TOTAL= option, respectively, in the PROC SURVEYIMPUTE statement. You cannot specify both of these options in the same PROC SURVEYIMPUTE statement. The procedure does not use a finite population correction for BRR or jackknife variance estimation.
If you do not specify the RATE= or TOTAL= option, the bootstrap replicate weights do not include a finite population correction. For fairly small sampling fractions, this correction is often ignored. For more information, see Cochran (1977) and Kish (1965).
If your design has multiple stages of selection and you are specifying the RATE= option, you should use the first-stage sampling rate, which is the ratio of the number of PSUs in the sample to the total number of PSUs in the population. If you are specifying the TOTAL= option for a multistage design, you should use the total number of PSUs in the population.
For a nonstratified sample design, or for a stratified sample design that has the same sampling rate or same population total in all strata, you can use the RATE=value or TOTAL=value option. If your sample design is stratified with different sampling rates or population totals in different strata, use the RATE=SAS-data-set or TOTAL=SAS-data-set option to name a SAS data set that contains the stratum sampling rates or totals. This data set is called a secondary data set, as opposed to the primary data set that you specify by using the DATA= option.
The secondary data set must contain all the stratification variables that are listed in the STRATA statement and all the variables in the BY statement. Furthermore, the BY groups must appear in the same order as in the primary data set. If there are formats associated with the STRATA variables and BY variables, then the formats must be consistent in the primary and secondary data sets. If you specify the TOTAL=SAS-data-set option, the secondary data set must have a variable named _TOTAL_ that contains the stratum population totals. If you specify the RATE=SAS-data-set option, the secondary data set must have a variable named _RATE_ that contains the stratum sampling rates. If the secondary data set contains more than one observation for any one stratum, the procedure uses the first value of _TOTAL_ or _RATE_ for that stratum and ignores the rest.
The value in the RATE= option or the values of _RATE_ in the secondary data set must be nonnegative numbers. You can specify value as a number between 0 and 1. Or you can specify value in percentage form as a number between 1 and 100, and PROC SURVEYIMPUTE converts that number to a proportion. The procedure treats the value 1 as 100% instead of 1%.
If you specify the TOTAL=value option, value must not be less than the sample size. If you provide stratum population totals in a secondary data set, these values must not be less than the corresponding stratum sample sizes.
If you have replicate weights available for your survey data, use the REPWEIGHTS statement to name the variables that contain the replicate weights. Replicate weights must be positive numbers. If an observation has a replicate weight that is nonpositive or missing, then PROC SURVEYIMPUTE does not perform any imputation. For more information, see the section Missing Values.