The SURVEYFREQ Procedure

PROC SURVEYFREQ Statement

  • PROC SURVEYFREQ <options> ;

The PROC SURVEYFREQ statement invokes the SURVEYFREQ procedure. It also identifies the data set to be analyzed, specifies the variance estimation method to use, and provides sample design information. The DATA= option names the input data set to be analyzed. The VARMETHOD= option specifies the variance estimation method, which is the Taylor series method by default. For Taylor series and bootstrap variance estimation, you can include a finite population correction factor in the analysis by providing either the sampling rate or population total in the RATE= or TOTAL= option, respectively. If your design is stratified with different sampling rates or totals for different strata, you can input these stratum rates or totals in a SAS data set that contains the stratification variables.

Table 1 summarizes the options available in the PROC SURVEYFREQ statement.

Table 1: PROC SURVEYFREQ Statement Options

Option Description
DATA= Names the input SAS data set
DEFF Specifies design effect details
MISSING Treats missing values as a valid level
NOMCAR Treats missing values as not missing completely at random
NOSUMMARY Suppresses the display of the "Data Summary" table
ORDER= Specifies the order of variable levels
PAGE Displays only one table per page
RATE= Specifies the first-stage sampling rate
TOTAL= Specifies the total number of primary sampling units
VARHEADER= Specifies the variable identification to display
VARMETHOD= Specifies the variance estimation method


You can specify the following options:

DATA=SAS-data-set

names the SAS-data-set to be analyzed by PROC SURVEYFREQ. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

DEFF(options)

specifies details of the design effect computations.

This option applies to all design effects in the analyses that you request in the TABLES statements. The DEFF option in the TABLES statement displays design effects of percentages in the frequency and crosstabulation tables. PROC SURVEYFREQ also uses design effects to compute Rao-Scott chi-square tests (CHISQ) and modified confidence limits for proportions (CL). For more information, see the sections Rao-Scott Chi-Square Test and Modified Confidence Limits.

You can specify the following options:

FPC=YES | NO

controls the inclusion or exclusion of the finite population correction (fpc) in the simple random sampling (SRS) variance component of the design effect. For more information, see the section Design Effect.

This option is available when you specify a finite population correction by providing sampling rates or population totals in the RATE= or TOTAL= option, respectively. For more information, see the section Population Totals and Sampling Rates.

FPC=YES includes the fpc and FPC=NO excludes the fpc from the SRS variance component of the design effect. By default, FPC=YES when you use Taylor series variance estimation and FPC=NO when you use replication variance estimation.

VARDEF=N

uses the number of observations (n) as the divisor in the simple random sampling (SRS) variance component of the design effect. By default, PROC SURVEYFREQ uses (n – 1) as the divisor. For more information, see the section Design Effect.

MISSING

treats missing values as a valid (nonmissing) category for all categorical variables, which include TABLES, STRATA, and CLUSTER variables.

By default (if you do not specify the MISSING option), PROC SURVEYFREQ excludes an observation from the analysis if the observation contains a missing value for any STRATA or CLUSTER variable. By default, PROC SURVEYFREQ also excludes an observation from the analysis if the observation contains a missing value for any variable in the table request. For more information, see the section Missing Values.

NOMCAR

includes observations with missing values of TABLES variables in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYFREQ computes variance estimates by analyzing the nonmissing values as a domain (subpopulation), where the entire population includes both nonmissing and missing domains. For more information, see the section Missing Values.

By default, PROC SURVEYFREQ completely excludes an observation from a frequency or crosstabulation table (and the corresponding variance computations) if that observation has a missing value for any of the variables in the table request, unless you specify the MISSING option. The NOMCAR option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level.

The NOMCAR option applies only to Taylor series variance estimation; it does not apply to replication variance estimation.

NOSUMMARY

suppresses the display of the "Data Summary" table, which PROC SURVEYFREQ produces by default. For information about this table, see the section Data Summary Table.

ORDER=DATA | FORMATTED | FREQ | INTERNAL

specifies the order of the variable levels in the frequency and crosstabulation tables, which you request in the TABLES statement. The ORDER= option also controls the order of the STRATA variable levels in the "Stratum Information" table.

You can specify one of the following orders:

ORDER= Levels Ordered By
DATA Order of appearance in the input data set
FORMATTED External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value
FREQ Descending frequency count; levels with the most observations come first in the order
INTERNAL Unformatted value

By default, ORDER=INTERNAL. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine-dependent. ORDER=FREQ orders levels by unweighted frequency (number of observations) rather than weighted frequency.

For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Programmers Guide: Essentials.

PAGE

displays only one table per page. Otherwise, PROC SURVEYFREQ displays multiple tables per page as space permits.

RATE=value | SAS-data-set
R=value | SAS-data-set

specifies the sampling rate, which PROC SURVEYFREQ uses to compute a finite population correction in Taylor series and bootstrap variance estimation. This option is not used in BRR, jackknife, and replicate weight variance estimation.

If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) in the sample to the total number of PSUs in the population.

You can specify one of the following forms:

RATE=value

specifies a single sampling rate value to use for all strata or for a nonstratified design.

RATE=SAS-data-set

names a SAS-data-set that contains the stratum sampling rates. You must provide the sampling rates in a data set variable named _RATE_.

The SAS-data-set must contain all stratification variables that you specify in the STRATA statement. It must also contain all stratum levels that appear in the DATA= input data set. If formats are associated with the STRATA variables, the formats must be consistent in the two data sets.

You can specify sampling rates as numbers between 0 and 1. Or you can specify sampling rates in percentage form as numbers between 1 and 100, which PROC SURVEYFREQ converts to proportions. The procedure treats the value 1 as 100% instead of 1%.

For more information, see the section Population Totals and Sampling Rates.

If you do not specify either the RATE= or TOTAL= option, the variance estimation does not include a finite population correction. You cannot specify both the RATE= and TOTAL= options.

TOTAL=value | SAS-data-set
N=value | SAS-data-set

specifies the total number of primary sampling units (PSUs) in the population. PROC SURVEYFREQ uses this information to compute a finite population correction in Taylor series and bootstrap variance estimation. This option is not used in BRR, jackknife, and replicate weight variance estimation.

You can specify one of the following forms:

TOTAL=value

specifies a single total value to use for all strata or for a nonstratified design. The value must be a positive number.

TOTAL=SAS-data-set

names a SAS-data-set that contains the stratum totals. You must provide the stratum totals in a data set variable named _TOTAL_. The totals must be positive numbers.

The SAS-data-set must contain all stratification variables that you specify in the STRATA statement. It must also contain all stratum levels that appear in the DATA= input data set. If formats are associated with the STRATA variables, the formats must be consistent in the two data sets.

For more information, see the section Population Totals and Sampling Rates.

If you do not specify either the TOTAL= or RATE= option, the variance estimation does not include a finite population correction. You cannot specify both the TOTAL= and RATE= options.

VARHEADER=LABEL | NAME | NAMELABEL

specifies the variable identification to use in the displayed output. This option controls the headings of the variable columns in one-way frequency tables, crosstabulation tables, and the "Stratum Information" table. This option also controls the variable identification in table titles. By default, VARHEADER=NAME.

The following table shows how PROC SURVEYFREQ interprets values of the VARHEADER= option.

VARHEADER= Variable Identification Displayed
LABEL Variable label
NAME Variable name
NAMELABEL Variable name and label, as Name (Label)

VARMETHOD=method <(method-options)>

specifies the variance estimation method. PROC SURVEYFREQ provides the Taylor series method and the following replication (resampling) methods: bootstrap, balanced repeated replication (BRR), and jackknife.

Table 2 summarizes the available methods and method-options.

Table 2: Variance Estimation Methods


For VARMETHOD=BOOTSTRAP, VARMETHOD=BRR, and VARMETHOD=JACKKNIFE, you can specify method-options in parentheses after the variance method. For example:

varmethod=BRR(reps=60 outweights=myReplicateWeights)

By default, VARMETHOD=JACKKNIFE if you specify a REPWEIGHTS statement; otherwise, VARMETHOD=TAYLOR by default.

You can specify one of the following methods:

BOOTSTRAP <(method-options)>

requests variance estimation by the bootstrap method. For more information, see the section Bootstrap Method.

The bootstrap method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you use a REPWEIGHTS statement to provide replicate weights.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

specifies how to compute the deviations for bootstrap variance estimation. You can specify one of the following values:

FULLSAMPLE

computes the deviations of the replicate estimates from the full sample estimate.

REPLICATES

computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Bootstrap Method.

DFADJ

computes the degrees of freedom by using the number of nonmissing strata and clusters for the current table request. If you specify this method-option, PROC SURVEYFREQ does not count any empty strata or clusters that occur when observations are excluded because of missing values of a TABLES variable in the current request.

By default, PROC SURVEYFREQ computes the degrees of freedom by counting the number of nonmissing strata and clusters for all valid observations in the input data set. The degrees of freedom for VARMETHOD=BOOTSTRAP equals the number of clusters minus the number of strata.

For more information, see the section Degrees of Freedom. For information about valid observations, see the section Data Summary Table.

This method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level.

This method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement or when you specify a REPWEIGHTS statement to provide replicate weights. When you specify a REPWEIGHTS statement, the degrees of freedom is the number of REPWEIGHTS variables (replicates) unless you specify the DF= option in the REPWEIGHTS or the TABLES statement.

MH=value(values) | SAS-data-set

specifies the number of PSUs to select for the bootstrap replicate samples. You can provide bootstrap stratum sample sizes m Subscript h by specifying a list of values or a SAS-data-set. Alternatively, you can provide a single bootstrap sample size value to use for all strata or for a nonstratified design. You can specify the number of replicate samples in the REPS= option. For more information, see the section Bootstrap Method.

Each bootstrap sample size m Subscript h must be a positive integer and must be less than n Subscript h, which is the total number of PSUs in stratum h. By default, m Subscript h = n Subscript h Baseline minus 1 for a stratified design. For a nonstratified design, the bootstrap sample size value must be less than n (the total number of PSUs in the sample). By default, m = n – 1 for a nonstratified design.

You can provide bootstrap sample sizes by specifying one of the following forms:

MH=value

specifies a single bootstrap sample size value to use for all strata or for a nonstratified design.

MH=(values)

specifies a list of stratum bootstrap sample size values. You can separate the values with blanks or commas, and you must enclose the list of values in parentheses. The number of values must not be less than the number of strata in the DATA= input data set.

The order of the stratum sample size values must match the order of the stratum levels in the DATA= input data set. Each stratum sample size value must be a positive integer and must be less than the total number of PSUs in the corresponding stratum.

MH=SAS-data-set

names a SAS-data-set that contains the stratum bootstrap sample sizes. You must provide the sample sizes in a data set variable named _NSIZE_ or SampleSize.

The SAS-data-set must contain all stratification variables that you specify in the STRATA statement. It must also contain all stratum levels that appear in the DATA= input data set. If formats are associated with the STRATA variables, the formats must be consistent in the two data sets.

Each value of the _NSIZE_ or SampleSize variable must be a positive integer and must be less than the total number of PSUs in the corresponding stratum.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set to store the bootstrap replicate weights that PROC SURVEYFREQ creates. For information about replicate weights, see the section Bootstrap Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set.

This method-option is not available when you provide replicate weights in a REPWEIGHTS statement.

REPS=number

specifies the number of replicates for bootstrap variance estimation. The value of number must be an integer greater than 1. Increasing the number of replicates improves the estimation precision but also increases the computation time. By default, REPS=250.

REPWARN <=NO | YES>

controls the log message that PROC SURVEYFREQ provides when a parameter cannot be estimated from one or more replicates. REPWARN or REPWARN=YES displays this log message as a warning. By default, REPWARN=NO, which displays the log message as a note.

For more information, see the subsection Variance Estimation in the section Bootstrap Method.

SEED=number

specifies the initial seed for random number generation for bootstrap replicate sampling.

If you do not specify this option or if you specify a number that is negative or 0, PROC SURVEYFREQ uses the time of day from the system clock to obtain an initial seed.

To reproduce the same bootstrap replicate weights and the same analysis in a subsequent execution of PROC SURVEYFREQ, you can specify the same initial seed that was used in the original analysis.

PROC SURVEYFREQ displays the value of the initial seed in the "Variance Estimation" table.

BRR <(method-options)>

requests variance estimation by balanced repeated replication (BRR). This method requires a stratified sample design where each stratum contains two primary sampling units (PSUs). When you specify this method, you must also specify a STRATA statement unless you provide replicate weights by using the REPWEIGHTS statement. For more information, see the section Balanced Repeated Replication (BRR) Method.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

specifies how to compute the deviations for BRR variance estimation. You can specify one of the following values:

FULLSAMPLE

computes the deviations of the replicate estimates from the full sample estimate.

REPLICATES

computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Balanced Repeated Replication (BRR) Method.

DFADJ

computes the degrees of freedom as the number of nonmissing strata for the current table request. If you specify this method-option, PROC SURVEYFREQ does not count any empty strata that occur when observations are excluded because of missing values of a TABLES variable in the current request.

By default, PROC SURVEYFREQ computes the degrees of freedom by counting the number of nonmissing strata for all valid observations in the input data set.

For more information, see the section Degrees of Freedom. For information about valid observations, see the section Data Summary Table.

This method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level.

This method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement or when you specify a REPWEIGHTS statement to provide replicate weights. When you specify a REPWEIGHTS statement, the degrees of freedom is the number of REPWEIGHTS variables (replicates) unless you specify the DF= option in the REPWEIGHTS or the TABLES statement.

FAY <=value>

requests Fay’s method, which is a modification of the BRR method. For more information, see the section Fay’s BRR Method.

You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient must be a nonnegative number less than 1. By default, the Fay coefficient is 0.5.

HADAMARD=SAS-data-set
H=SAS-data-set

names a SAS-data-set that contains the Hadamard matrix for BRR replicate construction. If you do not specify this method-option, PROC SURVEYFREQ generates an appropriate Hadamard matrix for replicate construction. For more information, see the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix.

If a Hadamard matrix of a particular dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS-data-set in this method-option.

In the HADAMARD= input data set, each variable corresponds to a column and each observation corresponds to a row of the Hadamard matrix. You can use any variable names in the HADAMARD= data set. All values in the data set must equal either 1 or –1. You must ensure that the matrix you provide is indeed a Hadamard matrix—that is, bold upper A prime bold upper A equals upper R bold upper I, where bold upper A is the Hadamard matrix of dimension R and bold upper I is an identity matrix. PROC SURVEYFREQ does not check the validity of the Hadamard matrix that you provide.

The HADAMARD= input data set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, PROC SURVEYFREQ uses only the first H variables. Similarly, the HADAMARD= input data set must contain at least H observations.

If you do not specify the REPS= method-option, the number of replicates is assumed to be the number of observations in the HADAMARD= input data set. If you specify the number of replicates—for example, REPS=nreps—the first nreps observations in the HADAMARD= data set are used to construct the replicates.

You can specify the PRINTH method-option to display the Hadamard matrix that PROC SURVEYFREQ uses to construct replicates for BRR.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set to store the replicate weights that PROC SURVEYFREQ creates for BRR variance estimation. For information about replicate weights, see the section Balanced Repeated Replication (BRR) Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set.

The OUTWEIGHTS= method-option is not available when you provide replicate weights in a REPWEIGHTS statement.

PRINTH

displays the Hadamard matrix that PROC SURVEYFREQ uses to construct replicates for BRR variance estimation. When you provide the Hadamard matrix in the HADAMARD= method-option, PROC SURVEYFREQ displays only the rows and columns that are actually used to construct replicates. For more information, see the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix.

The PRINTH method-option is not available when you provide replicate weights in a REPWEIGHTS statement because the procedure does not use a Hadamard matrix in this case.

REPS=number

specifies the number of replicates for BRR variance estimation. The value of number must be an integer greater than 1.

If you do not use the HADAMARD= method-option to provide a Hadamard matrix, the number of replicates should be greater than the number of strata and should be a multiple of 4. For more information, see the section Balanced Repeated Replication (BRR) Method. If PROC SURVEYFREQ cannot construct a Hadamard matrix for the REPS= value that you specify, the value is increased until a Hadamard matrix of that dimension can be constructed. Therefore, the actual number of replicates that PROC SURVEYFREQ uses might be larger than number.

If you use the HADAMARD= method-option to provide a Hadamard matrix, the value of number must not be greater than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the REPS= method-option, the number of replicates is the number of rows in the Hadamard matrix.

If you do not specify the REPS= or the HADAMARD= method-option and do not use a REPWEIGHTS statement, the number of replicates is the smallest multiple of 4 that is greater than the number of strata.

If you use a REPWEIGHTS statement to provide replicate weights, PROC SURVEYFREQ does not use the REPS= method-option; the number of replicates is the number of REPWEIGHTS variables.

REPWARN <=NO | YES>

controls the log message that PROC SURVEYFREQ provides when a parameter cannot be estimated from one or more replicates. REPWARN or REPWARN=YES displays this log message as a warning. By default, REPWARN=NO, which displays the log message as a note.

For more information, see the subsection Variance Estimation in the section Balanced Repeated Replication (BRR) Method.

JACKKNIFE <(method-options)>
JK <(method-options)>

requests variance estimation by the delete-1 jackknife method. For more information, see the section Jackknife Method. If you use a REPWEIGHTS statement to provide replicate weights, VARMETHOD=JACKKNIFE is the default variance estimation method.

The delete-1 jackknife method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you use a REPWEIGHTS statement to provide replicate weights.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

specifies how to compute the deviations for jackknife variance estimation. You can specify one of the following values:

FULLSAMPLE

computes the deviations of the replicate estimates from the full sample estimate.

REPLICATES

computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Jackknife Method.

DFADJ

computes the degrees of freedom by using the number of nonmissing strata and clusters for the current table request. If you specify this method-option, PROC SURVEYFREQ does not count any empty strata or clusters that occur when observations are excluded because of missing values of a TABLES variable in the current request.

By default, PROC SURVEYFREQ computes the degrees of freedom by counting the number of nonmissing strata and clusters for all valid observations in the input data set. The degrees of freedom for VARMETHOD=JACKKNIFE equals the number of clusters minus the number of strata.

For more information, see the section Degrees of Freedom. For information about valid observations, see the section Data Summary Table.

This method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level.

This method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement or when you specify a REPWEIGHTS statement to provide replicate weights. When you specify a REPWEIGHTS statement, the degrees of freedom is the number of REPWEIGHTS variables (replicates) unless you specify the DF= option in the REPWEIGHTS or the TABLES statement.

OUTJKCOEFS=SAS-data-set

names a SAS-data-set to store the jackknife coefficients. For information about jackknife coefficients, see the section Jackknife Method. For information about the contents of the OUTJKCOEFS= data set, see the section Jackknife Coefficient Output Data Set.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set to store the replicate weights that PROC SURVEYFREQ creates for jackknife variance estimation. For information about replicate weights, see the section Jackknife Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set.

This method-option is not available when you use a REPWEIGHTS statement to provide replicate weights.

REPWARN <=NO | YES>

controls the log message that PROC SURVEYFREQ provides when a parameter cannot be estimated from one or more replicates. REPWARN or REPWARN=YES displays this log message as a warning. By default, REPWARN=NO, which displays the log message as a note.

For more information, see the subsection Variance Estimation in the section Jackknife Method.

TAYLOR

requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS statement. For more information, see the section Taylor Series Variance Estimation.

Last updated: December 09, 2022