-
ALLOC=name | (values)| SAS-data-set
-
specifies the allocation method name or specifies the stratum allocation proportions as a list of values or a SAS-data-set. You can use the ALLOC= option with any selection method (which you specify in the PROC SURVEYSELECT statement) except METHOD=PPS_BREWER and METHOD=PPS_MURTHY, either of which selects two units from each stratum.
You can specify the sample size allocation by using one of the following forms:
-
ALLOC=name
-
specifies the method for allocating the total sample size among the strata. You can specify one of the following values for name:
-
NEYMAN
requests Neyman allocation, which allocates the total sample size among the strata in proportion to the stratum sizes and variances. For more information, see the section Neyman Allocation. If you specify ALLOC=NEYMAN, you must provide the stratum variances by also specifying the VAR= option.
-
OPTIMAL
OPT
requests optimal allocation, which allocates the total sample size among the strata in proportion to the stratum sizes, stratum variances, and stratum costs. For more information, see the section Optimal Allocation. If you specify ALLOC=OPTIMAL, you must provide the stratum variances by also specifying the VAR= option, and you must provide the stratum costs by also specifying the COST= option.
-
PROPORTIONAL
PROP
requests proportional allocation, which allocates the total sample size in proportion to the stratum sizes, where stratum size is the number of sampling units in the stratum. For more information, see the section Proportional Allocation.
-
ALLOC=(values)
-
specifies a list of stratum allocation proportion values. You can separate the values with blanks or commas, and you must enclose the list of values in parentheses. Each value should correspond to a stratum group, and the number of values must equal the number of strata in the input data set.
A stratum allocation proportion specifies the proportion of the total sample size to allocate to the stratum. The sum of the allocation proportions must be 1 or 100%.
The allocation proportions must be positive numbers. You can specify the proportion values as numbers between 0 and 1. Or you can specify the values in percentage form (as numbers between 1 and 100), and PROC SURVEYSELECT converts the numbers to proportions. PROC SURVEYSELECT treats the value 1 as 100% instead of 1%.
The order of the stratum allocation proportions must match the order of the stratum groups in the DATA= input data set. When you specify a list of proportion values, the input data set must be sorted by the STRATA variables in ascending order; you cannot use the DESCENDING or NOTSORTED option in the STRATA statement.
-
ALLOC=SAS-data-set
-
names a SAS-data-set that contains stratum allocation proportions. You should provide the stratum allocation proportions in the data set variable named _ALLOC_. Each observation in the data set should correspond to a stratum group, which is determined by the values of the STRATA variables.
A stratum allocation proportion specifies the proportion of the total sample size to allocate to the corresponding stratum. The sum of the allocation proportions must be 1 or 100%.
The allocation proportions must be positive numbers. You can specify the proportion values as numbers between 0 and 1. Or you can specify the values in percentage form (as numbers between 1 and 100), and PROC SURVEYSELECT converts the numbers to proportions. PROC SURVEYSELECT treats the value 1 as 100% instead of 1%.
The ALLOC= data set, which is a secondary input data set, must contain all stratification variables that you specify in the STRATA statement. The data set must also contain all stratum groups that appear in the DATA= input data set. The order of the stratum groups in the ALLOC= data set must match the order of the groups in the DATA= data set. If formats are associated with the STRATA variables, the formats must be consistent between the two data sets. For more information, see the section Secondary Input Data Set. You can name only one secondary data set in each invocation of PROC SURVEYSELECT.
-
ALLOCMAX=n
-
specifies the maximum sample size n to allocate to a stratum. When an allocated stratum sample size is greater than n, PROC SURVEYSELECT allocates only n units to the stratum. The procedure then allocates the remaining sample size among the remaining strata by using the allocation method or proportions that you specify. For more information, see the section Proportional Allocation.
For without-replacement methods, the allocated stratum size sample must also not exceed the number of sampling units in the stratum.
The maximum stratum sample size n must be a positive integer. The value of n times the number of strata must not be less than the total sample size to be allocated.
This option is available when you specify the ALLOC=PROPORTIONAL, ALLOC=(values), or ALLOC=SAS-data-set option.
When you specify both the ALLOCMAX= and ALLOCMIN= options, the ALLOCMAX= value must be greater than the ALLOCMIN= value.
-
ALLOCMIN=n
-
specifies the minimum sample size n to allocate to a stratum. When you specify this option, PROC SURVEYSELECT allocates at least n sampling units to each stratum.
The minimum stratum sample size n must be a positive integer. The value of n times the number of strata must not exceed the total sample size to be allocated. For without-replacement selection methods, the value of n must not exceed the number of sampling units in any stratum.
By default, PROC SURVEYSELECT allocates at least one sampling unit to each stratum.
-
ALPHA=
-
specifies the confidence level that PROC SURVEYSELECT uses in the MARGIN= computations. For more information, see the section Specifying the Margin of Error.
The value of
must be between 0 and 1; a confidence level of
produces a
% confidence interval. By default, ALPHA=0.05, which produces a 95% confidence interval.
-
COST <=values | SAS-data-set>
-
specifies the stratum-level costs that PROC SURVEYSELECT uses to compute optimal allocation when you specify ALLOC=OPTIMAL. For more information, see the section Optimal Allocation. The stratum costs must be positive numbers. A stratum cost represents the per-unit cost, which is the survey cost of a single unit in the stratum.
You can provide stratum costs by specifying one of the following forms:
-
COST
indicates that stratum costs are provided in a secondary input data set that you name in another option (for example, the VAR=SAS-data-set option). You should provide the stratum costs in the data set variable named _COST_. For more information, see the section Secondary Input Data Set. You can name only one secondary input data set in each invocation of PROC SURVEYSELECT.
-
COST=(values)
-
specifies a list of stratum cost values. You can separate the values with blanks or commas, and you must enclose the list of values in parentheses. Each value should correspond to a stratum group, and the number of values must equal the number of strata in the input data set.
The order of the stratum cost values must match the order of the stratum groups in the DATA= input data set. When you specify a list of values, the input data set must be sorted by the STRATA variables in ascending order; you cannot use the DESCENDING or NOTSORTED option in the STRATA statement.
-
COST=SAS-data-set
-
names a SAS-data-set that contains the stratum costs. You should provide the stratum costs in the data set variable named _COST_. Each observation in the data set should correspond to a stratum group, which is determined by the values of the STRATA variables.
This data set, which is a secondary data set, must contain all stratification variables that you specify in the STRATA statement. The data set must also contain all stratum groups that appear in the DATA= input data set. The order of the stratum groups in the COST= data set must match the order of the groups in the DATA= data set. If formats are associated with the STRATA variables, the formats must be consistent in the two data sets. For more information, see the section Secondary Input Data Set. You can name only one secondary input data set in each invocation of PROC SURVEYSELECT.
-
MARGIN=value
-
specifies the margin of error for the estimate of the overall mean from the stratified sample. When you specify this option, PROC SURVEYSELECT determines the stratum sample sizes that achieve the margin value by using the allocation method or proportions that you specify in the ALLOC= option. For more information, see the section Specifying the Margin of Error.
The value must be a positive number. When you specify this option, you must also provide the stratum variances in the VAR= option.
You can use the ALPHA= option to specify the confidence level for the MARGIN= computations. By default, ALPHA=0.05, which produces a 95% confidence interval.
You can specify the MARGIN= option with any allocation method (proportional, optimal, or Neyman) or with allocation proportions that you provide (ALLOC=(values) or ALLOC=SAS-data-set).
Allocation to achieve a specified margin is an alternative approach to the allocation of a specified total sample size. Therefore, when you specify the MARGIN= option, you cannot also specify a total sample size in the SAMPSIZE= option in the PROC SURVEYSELECT statement.
-
NOSAMPLE
requests that PROC SURVEYSELECT not select a sample after computing the allocation. When you specify this option, the OUT= output data set contains the stratum sample sizes that PROC SURVEYSELECT computes. For more information, see the section Allocation Output Data Set. (By default, PROC SURVEYSELECT selects a sample after computing the allocation.)
-
STATS
displays sample allocation statistics. When you specify the MARGIN= option, the STATS option displays the expected margin of error for the allocation. For more information, see the section Specifying the Margin of Error. When you specify ALLOC=OPTIMAL or ALLOC=NEYMAN but do not specify the MARGIN= option, the STATS option displays the expected variance, which is computed from the stratum variances that you provide and the allocated stratum sample sizes. When you specify ALLOC=OPTIMAL, the STATS option also displays the total stratum-level cost, which is computed from the stratum costs that you provide and the allocated stratum sample sizes.
-
VAR <=values | SAS-data-set>
-
specifies the stratum variances that PROC SURVEYSELECT uses to compute optimal allocation (ALLOC=OPTIMAL), Neyman allocation (ALLOC=NEYMAN), or allocation for a specified margin (MARGIN=). The stratum variances must be positive numbers.
You can provide stratum variances by specifying one of the following forms:
-
VAR
indicates that stratum variances are provided in a secondary input data set that you name in another option (for example, the COST=SAS-data-set option). You should provide the stratum variances in the data set variable named _VAR_. For more information, see the section Secondary Input Data Set. You can name only one secondary input data set in each invocation of PROC SURVEYSELECT.
-
VAR=(values)
-
specifies a list of stratum variance values. You can separate the values with blanks or commas, and you must enclose the list of values in parentheses. Each value should correspond to a stratum group, and the number of values must equal the number of strata in the input data set.
The order of the stratum variance values must match the order of the stratum groups in the DATA= input data set. When you specify a list of values, the input data set must be sorted by the STRATA variables in ascending order; you cannot use the DESCENDING or NOTSORTED option in the STRATA statement.
-
VAR=SAS-data-set
-
names a SAS-data-set that contains the stratum variances. You should provide the stratum variances in the data set variable named _VAR_. Each observation in the data set should correspond to a stratum group, which is determined by the values of the STRATA variables.
This data set, which is a secondary data set, must contain all stratification variables that you specify in the STRATA statement. The data set must also contain all stratum groups that appear in the DATA= input data set. The order of the stratum groups in the VAR= data set must match the order of the groups in the DATA= data set. If formats are associated with the STRATA variables, the formats must be consistent in the two data sets. For more information, see the section Secondary Input Data Set. You can name only one secondary input data set in each invocation of PROC SURVEYSELECT.