The SURVEYMEANS Procedure

PROC SURVEYMEANS Statement

PROC SURVEYMEANS <options> statistic-keywords;

The PROC SURVEYMEANS statement invokes the SURVEYMEANS procedure. In this statement, you identify the data set to be analyzed, specify the variance estimation method, and provide sample design information. The DATA= option names the input data set to be analyzed. The VARMETHOD= option specifies the variance estimation method, which is the Taylor series method by default. For Taylor series variance estimation, you can include a finite population correction factor in the analysis by providing either the sampling rate or population total with the RATE= or TOTAL= option. If your design is stratified, with different sampling rates or totals for different strata, then you can input these stratum rates or totals in a SAS data set that contains the stratification variables.

In the PROC SURVEYMEANS statement, you also can use statistic-keywords to specify statistics, such as population mean and population total, for PROC SURVEYMEANS to compute. You can also request data set summary information and sample design information.

Table 2 summarizes the options available in the PROC SURVEYMEANS statement.

Table 2: PROC SURVEYMEANS Statement Options

Option	Description
ALPHA=	Sets the confidence level for confidence limits
DATA=	Specifies the SAS data set to be analyzed
MISSING	Treats missing values as a valid category
NOMCAR	Computes variance estimates by analyzing the nonmissing values as a domain
NONSYMCL	Requests nonsymmetric confidence limits for quantiles
NOSPARSE	Suppresses the display of analysis variables with zero frequency
ORDER=	Specifies the order in which to report the values of the categorical variables
PERCENTILE=	Specifies percentiles that you want the procedure to compute
PLOTS=	Requests plots from ODS Graphics
QUANTILE=	Specifies quantiles that you want the procedure to compute
RATE=	Specifies the sampling rate
STACKING	Produces the output data sets by using a stacking table structure
TOTAL=	Specifies the total number of primary sampling units
VARHEADER=	Specifies the variable identification to display
VARMETHOD=	Specifies the variance estimation method

You can specify the following options in the PROC SURVEYMEANS statement:

ALPHA=

sets the confidence level for confidence limits. The value of the ALPHA= option must be between 0 and 1, and the default value is 0.05. A confidence level of produces % confidence limits. The default of ALPHA=0.05 produces 95% confidence limits.

DATA=SAS-data-set

specifies the SAS data set to be analyzed by PROC SURVEYMEANS. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

MISSING

treats missing values as a valid (nonmissing) category for all categorical variables, which include CLASS, STRATA, CLUSTER, DOMAIN, and POSTSTRATA variables.

By default, if you do not specify the MISSING option, an observation is excluded from the analysis if it has a missing value. For more information, see the section Missing Values.

NOMCAR

treats missing values in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify this option, PROC SURVEYMEANS computes variance estimates by analyzing the nonmissing values as a domain (subpopulation), where the entire population includes both nonmissing and missing domains. For more information, see the section Missing Values.

By default, PROC SURVEYMEANS completely excludes an observation from analysis if that observation has a missing value, unless you specify the MISSING option for categorical variables. Note that the NOMCAR option has no effect on a categorical variable when you specify the MISSING option, which treats missing values as a valid nonmissing level.

The NOMCAR option applies only to Taylor series variance estimation; it is ignored for replication methods.

NONSYMCL

requests nonsymmetric confidence limits for quantiles when you request quantiles with PERCENTILE= or QUANTILE= option. This option applies only to the default VARMETHOD=TAYLOR option. For more details, see the section Confidence Limits.

NOSPARSE

suppresses the display of analysis variables with zero frequency. By default, the procedure displays all continuous variables and all levels of categorical variables.

ORDER=DATA | FORMATTED | INTERNAL

specifies the order in which the values of the categorical variables are to be reported.

This option also determines the sort order for the levels of ClUSTER and DOMAIN variables and controls STRATA variable levels in the "Stratum Information" table.

The following shows how PROC SURVEYMEANS interprets values of the ORDER= option:

DATA: orders values according to their order in the input data set.
FORMATTED: orders values by their formatted values. This order is operating environment dependent. By default, the order is ascending.
INTERNAL: orders values by their unformatted values, which yields the same order that the SORT procedure does. This order is operating environment dependent.

By default, ORDER=INTERNAL. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine-dependent.

For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in the "Grouping Data" section of SAS Programmers Guide: Essentials.

PERCENTILE=(values)

specifies percentiles you want the procedure to compute. You can separate values with blanks or commas. Each value must be between 0 and 100. You can also use the statistic-keywords DECILES, MEDIAN, Q1, Q3, and QUARTILES to request common percentiles.

PROC SURVEYMEANS uses Woodruff’s method (Dorfman and Valliant 1993; Särndal, Swensson, and Wretman 1992; Francisco and Fuller 1991) to estimate the variances of quantiles. For more details, see the section Quantiles.

PLOTS < ( global-plot-options ) > < = plot-request < (plot-option) > > PLOTS < ( global-plot-options ) > < = ( plot-request < (plot-option) > <…plot-request < (plot-option) >> )>

controls the plots that are produced through ODS Graphics.

A plot-request identifies the plot, and a plot-option controls the appearance and content of the plot. You can specify plot-options in parentheses after a plot-request. A global-plot-option applies to all plots for which it is available. You can specify global-plot-options in parentheses after the PLOTS option.

When you specify only one plot-request, you can omit the parentheses around it. Here are a few examples of requesting plots:

plots=all
plots(unpack)=summary
plots=(summary(unpack) domain)
plots=boxplot
plots=(domain(packvar) histogram)

You can suppress default plots and request specific plots by specifying the PLOTS(ONLY)= option; PLOTS(ONLY)=(plot-requests) produces only the plots that are specified as plot-requests.

ODS Graphics must be enabled before you can request plots. For example:

ods graphics on;
proc surveymeans plots=boxplot;
   variable income;
run;

For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 24, Statistical Graphics Using ODS.

When ODS Graphics is enabled but you do not specify the PLOTS= option, PROC SURVEYMEANS produces summary plots, and it also produces domain plots when you specify a DOMAIN statement. You can suppress all plots by specifying PLOTS=NONE.

For a continuous analysis variable, PROC SURVEYMEANS provides a summary plot, which contains a box plot and a histogram plot. For a categorical variable, PROC SURVEYFREQ provides corresponding plots for it.

For general information about ODS Graphics, see Chapter 24, Statistical Graphics Using ODS.

Global Plot Option

A global-plot-option applies to all plots for which the option is available. You can specify the following global-plot-options:

ONLY: suppresses the default plots and requests only the plots that are specified as plot-requests.
NBINS=value: specifies the number of bins in a histogram plot. If you do not specify this option, then by default the number of bins is determined by the method of Terrell and Scott (1985).
UNPACK: requests that the procedure create a histogram with overlaid densities and a box plot along with a confidence interval band separately.

Plot Requests

You can specify the following plot-requests:

ALL

requests all appropriate plots.

BOXPLOT | BOX

requests a box plot for continuous variables.

DOMAIN < ( plot-options )>

requests box plots for domain statistics for each domain definition. By default, the procedure plots each domain in a single panel for all continuous analysis variables. This plot is produced by default if you specify a DOMAIN statement. You can specify the following plot-options:

EXCLUDE: requests that the procedure create box plots for every domain level of a domain but exclude the box plot for the full sample. By default, the box plot includes the full sample box plot.
PACKDOMAIN: requests box plots for all domain definitions in one panel for each analysis variable.
PACKVAR: requests box plots for all analysis variables in one panel for each domain definition. This is the default when you do not specify the UNPACK option.
UNPACK: requests a box plot for each domain and for each analysis variable in a single panel.

HISTOGRAM < ( plot-option )> HIST < ( plot-option )>

requests a histogram with overlaid normal and kernel densities. You can specify the following plot-option:

NBINS=value: specifies the number of bins in a histogram plot. If you do not specify this option, then by default the number of bins is determined by the method of Terrell and Scott (1985).

NONE

suppresses all plots.

SUMMARY < ( plot-options )>

requests that a histogram and a box plot be displayed together in a single panel, sharing the same X axis. This packed plot is produced by default. You can specify the following plot-options:

NBINS=value: specifies the number of bins in a histogram plot. If you do not specify this option, then by default the number of bins is determined by the method of Terrell and Scott (1985).
UNPACK: requests that a histogram with overlaid densities be displayed in one panel and a box plot along with a confidence interval band be displayed separately. Note that specifying PLOTS(ONLY)=SUMMARY(UNPACK) is exactly the same as specifying PLOTS(ONLY)=(BOX HISTOGRAM).

PLOTS=SUMMARY overwrites the PLOTS=BOX and the PLOTS=HISTOGRAM plot-requests. That is, if you do not specify the UNPACK option, PROC SURVEYMEANS does not display a histogram plot or a box plot by itself when PLOTS=SUMMARY is specified.

QUANTILE=(values)

specifies quantiles you want the procedure to compute. You can separate values with blanks or commas. Each value must be between 0 and 1. You can also use the statistic-keywords DECILES, MEDIAN, Q1, Q3, and QUARTILES to request common quantiles.

RATE=value | SAS-data-set R=value | SAS-data-set

specifies the sampling rate, which PROC SURVEYMEANS uses to compute a finite population correction for Taylor series or bootstrap variance estimation. This option is ignored for the jackknife or balanced repeated replication (BRR) variance estimation method.

If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) in the sample to the total number of PSUs in the population.

You can specify the sampling rate in either of the following ways:

value: specifies a nonnegative number to use for a nonstratified design or for a stratified design that has the same sampling rate in each stratum.
SAS-data-set: specifies a SAS-data-set that contains the stratification variables and the sampling rates for a stratified design that has different sampling rates in the strata. You must provide the sampling rates in the data set variable named _RATE_. The sampling rates must be nonnegative numbers.

You can specify sampling rates as numbers between 0 and 1. Or you can specify sampling rates in percentage form as numbers between 1 and 100, which PROC SURVEYMEANS converts to proportions. The procedure treats the value 1 as 100% instead of 1%.

For more information, see the section Specification of Population Totals and Sampling Rates.

If you do not specify either the RATE= or TOTAL= option, the Taylor series or bootstrap variance estimation does not include a finite population correction. You cannot specify both the RATE= and TOTAL= options.

STACKING

requests that the procedure produce the output data sets by using a stacking table structure, which was the default before SAS 9. The new default is to produce a rectangular table structure in the output data sets.

A rectangular structure creates one observation for each analysis variable in the data set. A stacking structure creates only one observation in the output data set for all analysis variables.

The STACKING option affects the following tables:

Domain
Statistics
StrataInfo

For more information, see the section Rectangular and Stacking Structures in an Output Data Set.

TOTAL=value | SAS-data-set N=value | SAS-data-set

specifies the total number of primary sampling units (PSUs) in the study population. PROC SURVEYMEANS uses this information to compute a finite population correction for Taylor series or bootstrap variance estimation. This option is ignored for the jackknife or BRR variance estimation method.

You can specify the total number of PSUs in either of the following ways:

value: specifies a positive number to use for a nonstratified design or for a stratified design that has the same population total in each stratum.
SAS-data-set: specifies a SAS-data-set that contains the stratification variables and the population totals for a stratified design that has different population totals in the strata. You must provide the stratum totals in the data set variable named _TOTAL_. The stratum totals must be positive numbers.

For more information, see the section Specification of Population Totals and Sampling Rates.

If you do not specify either the TOTAL= or RATE= option, the Taylor series or bootstrap variance estimation does not include a finite population correction. You cannot specify both the TOTAL= and RATE= options.

statistic-keywords

specifies the statistics for the procedure to compute. If you do not specify any statistic-keywords, PROC SURVEYMEANS computes the NOBS, MEAN, STDERR, and CLM statistics by default.

The statistics produced depend on the type of the analysis variable. If you name a numeric variable in the CLASS statement, then the procedure analyzes that variable as a categorical variable. The procedure always analyzes character variables as categorical. For more information, see the section CLASS Statement.

PROC SURVEYMEANS computes MIN, MAX, and RANGE for numeric variables but not for categorical variables. For numeric variables, the keyword MEAN produces the mean, but for categorical variables it produces the proportion in each category or level. Also, for categorical variables, the keyword NOBS produces the number of observations for each variable level, and the keyword NMISS produces the number of missing observations for each level. If you request the keyword NCLUSTER for a categorical variable, PROC SURVEYMEANS displays for each level the number of clusters with observations in that level. PROC SURVEYMEANS computes SUMWGT in the same way for both categorical and numeric variables, as the sum of the weights over all nonmissing observations.

PROC SURVEYMEANS performs univariate analysis, analyzing each variable separately. Thus the number of nonmissing and missing observations might not be the same for all analysis variables. For more information, see the section Missing Values.

The following statistics are available for ratios (which you request with a RATIO statement): N, NCLU, SUMWGT, RATIO, STDERR, DF, T, PROBT, and CLM, as shown in the following list. If no statistics are requested, the procedure computes the ratio and its standard error by default.

You can specify the following statistic-keywords:

ALL: requests all available statistics except those that are associated with geometric means.
ALLGEO: requests all available statistics that are associated with geometric means.
CLM: requests the % two-sided confidence limits for MEAN, where is determined by the ALPHA= option; the default is .
CLSUM: requests the % two-sided confidence limits for SUM, where is determined by the ALPHA= option; the default is .
CV: requests the coefficient of variation for MEAN.
CVSUM: requests the coefficient of variation for SUM.
DECILES: requests the 10th through the 90th percentiles, including their standard errors and confidence limits.
DEFF: requests the design effect for MEAN.
DF: requests the degrees of freedom for the t test.
GEOMEAN: requests the geometric mean of a numeric variable that contains positive values.
GMCLM: requests the % two-sided confidence limits for GEOMEAN, where is determined by the ALPHA= option; the default is .
GMSTDERR: requests the standard error of GEOMEAN. When you specify GEOMEAN, SURVEYMEANS procedure computes GMSTDERR by default.
LCLM: requests the % one-sided lower confidence limit for MEAN, where is determined by the ALPHA= option; the default is .
LCLSUM: requests the % one-sided lower confidence limit for SUM, where is determined by the ALPHA= option; the default is .
LGMCLM: requests the % one-sided lower confidence limit for GEOMEAN, where is determined by the ALPHA= option; the default is .
MAX: requests the maximum value.
MEAN: requests the mean for a numeric variable, or the proportion in each category for a categorical variable.
MEDIAN: requests the median (50th percentile) for a numeric variable.
MIN: requests the minimum value.
NCLUSTER: requests the number of clusters.
NMISS: requests the number of missing observations.
NOBS: requests the number of nonmissing observations.
Q1: requests the lower quartile (25th percentile).
Q3: requests the upper quartile (75th percentile).
QUARTILES: requests Q1 (25th percentile), MEDIAN (50th percentile), and Q3 (75th percentile), including their standard errors and confidence limits.
RANGE: requests the range, MAX–MIN.
RATIO: requests the ratio of means or proportions.
STD: requests the standard deviation of SUM. When you request SUM, the procedure computes STD by default.
STDERR: requests the standard error of MEAN or RATIO. When you request MEAN or RATIO, the procedure computes STDERR by default.
SUM: requests the weighted sum, , or estimated population total when the appropriate sampling weights are used.
SUMWGT: requests the sum of the weights, .
T: requests the t value and its corresponding p-value with DF degrees of freedom for , where is a requested statistic.
UCLM: requests the % one-sided upper confidence limit for MEAN, where is determined by the ALPHA= option; the default is .
UCLSUM: requests the % one-sided upper confidence limit for SUM, where is determined by the ALPHA= option; the default is .
UGMCLM: requests the % one-sided upper confidence limit for GEOMEAN, where is determined by the ALPHA= option; the default is .
VAR: requests the variance of MEAN or RATIO.
VARSUM: requests the variance of SUM.

For details about how PROC SURVEYMEANS computes these statistics, see the section Statistical Computations.

VARHEADER=LABEL | NAME | NAMELABEL

specifies the variable identification to use in the displayed output. This option controls the headings of the DOMAIN variable in domain analysis output and of the STRATUM variables in the "Stratum Information" table. By default, VARHEADER=NAME.

You can specify the following values:

LABEL: displays the variable label.
NAME: displays the variable name.
NAMELABEL: displays the variable name and label as Name (Label).

This option has no effect on tables that use the STACKING option.

VARMETHOD=method <(method-options)>

specifies the variance estimation method. PROC SURVEYMEANS provides the Taylor series method and the following replication (resampling) methods: balanced repeated replication (BRR), bootstrap, and jackknife.

Table 3 summarizes the available methods and method-options.

Table 3: Variance Estimation Methods

method	Variance Estimation Method	method-options
BOOTSTRAP	Bootstrap	CENTER=
		DFADJ
		MH=value \| SAS-data-set
		OUTWEIGHTS=SAS-data-set
		REPS=number
		SEED=number
BRR	Balanced repeated replication	CENTER=
		DFADJ
		FAY <=value>
		HADAMARD=SAS-data-set
		NAIVEQVAR
		OUTWEIGHTS=SAS-data-set
		PRINTH
		REPS=number
JACKKNIFE \| JK	Jackknife	CENTER=
		DFADJ
		NAIVEQVAR
		OUTJKCOEFS=SAS-data-set
		OUTWEIGHTS=SAS-data-set
TAYLOR	Taylor series linearization	None

For VARMETHOD=BOOTSTRAP, VARMETHOD=BRR, and VARMETHOD=JACKKNIFE, you can specify method-options in parentheses after the variance estimation method. For example:

varmethod=BRR(reps=60 outweights=myReplicateWeights)

By default, VARMETHOD=JACKKNIFE if you also specify a REPWEIGHTS statement; otherwise, VARMETHOD=TAYLOR by default.

You can specify the following methods:

BOOTSTRAP <(method-options)>

requests variance estimation by the bootstrap method. For more information, see the section Bootstrap Method.

The bootstrap method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you use a REPWEIGHTS statement to provide replicate weights.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the bootstrap method. You can specify the following values:

FULLSAMPLE: computes the deviations of the replicate estimates from the full sample estimate.
REPLICATES: computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Bootstrap Method.

DFADJ

computes the degrees of freedom by using the number of nonempty strata for an analysis variable. The degrees of freedom for VARMETHOD=JACKKNIFE equals the number of clusters (or number of observations if there are no clusters) minus the number of strata (or one if there are no strata). By default, the number of strata is based on all valid observations in the data set. But if you specify this method-option, PROC SURVEYMEANS does not count any empty strata that are caused by all observations containing missing values for an analysis variable.

For more information, see the section Degrees of Freedom. For more information about valid observations, see the section Data and Sample Design Summary.

This method-option has no effect on categorical variables when you specify the MISSING option, which treats missing values as a valid nonmissing level.

This method-option cannot be used when you provide replicate weights in a REPWEIGHTS statement. When you use a REPWEIGHTS statement, the degrees of freedom equals the number of REPWEIGHTS variables (replicates) unless you specify an alternative value in the DF= option in the REPWEIGHTS statement.

MH=value | (values) | SAS-data-set

specifies the number of PSUs to select for the bootstrap replicate samples. You can provide bootstrap stratum sample sizes by specifying a list of values or a SAS-data-set. Alternatively, you can provide a single bootstrap sample size value to use for all strata or for a nonstratified design. You can specify the number of replicate samples in the REPS= option. For more information, see the section Bootstrap Method.

Each bootstrap sample size must be a positive integer and must be less than , which is the total number of PSUs in stratum h. By default, = for a stratified design. For a nonstratified design, the bootstrap sample size value must be less than n (the total number of PSUs in the sample). By default, m = n – 1 for a nonstratified design.

You can provide bootstrap sample sizes by specifying one of the following forms:

MH=value

specifies a single bootstrap sample size value to use for all strata or for a nonstratified design.

MH=(values)

specifies a list of stratum bootstrap sample size values. You can separate the values with blanks or commas, and you must enclose the list of values in parentheses. The number of values must not be less than the number of strata in the DATA= input data set.

Each stratum sample size value must be a positive integer and must be less than the total number of PSUs in the corresponding stratum.

MH=SAS-data-set

names a SAS-data-set that contains the stratum bootstrap sample sizes. You must provide the sample sizes in a data set variable named _NSIZE_ or SampleSize.

The SAS-data-set must contain all stratification variables that you specify in the STRATA statement. It must also contain all stratum levels that appear in the DATA= input data set. If formats are associated with the STRATA variables, the formats must be consistent in the two data sets.

Each value of the _NSIZE_ or SampleSize variable must be a positive integer and must be less than the total number of PSUs in the corresponding stratum.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set in which to store the bootstrap replicate weights that PROC SURVEYMEANS creates. For information about replicate weights, see the section Bootstrap Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weights Output Data Set.

This method-option is not available when you provide replicate weights in a REPWEIGHTS statement.

REPS=number

specifies the number of replicates for bootstrap variance estimation. The value of number must be an integer greater than 1. Increasing the number of replicates improves the estimation precision but also increases the computation time. By default, REPS=250.

SEED=number

specifies the initial seed for random number generation for bootstrap replicate sampling.

If you do not specify this option or if you specify a number that is negative or 0, PROC SURVEYMEANS uses the time of day from the system clock to obtain an initial seed.

To reproduce the same bootstrap replicate weights and the same analysis in a subsequent execution of PROC SURVEYMEANS, you can specify the same initial seed that was used in the original analysis.

PROC SURVEYMEANS displays the value of the initial seed in the "Variance Estimation" table.

BRR <(method-options)>

requests variance estimation by balanced repeated replication (BRR). This method requires a stratified sample design where each stratum contains two primary sampling units (PSUs). When you specify this method, you must also specify a STRATA statement unless you provide replicate weights by using the REPWEIGHTS statement. For more information, see the section Balanced Repeated Replication (BRR) Method.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the bootstrap method. You can specify the following values:

FULLSAMPLE: computes the deviations of the replicate estimates from the full sample estimate.
REPLICATES: computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Balanced Repeated Replication (BRR) Method.

DFADJ

computes the degrees of freedom by using the number of nonempty strata for an analysis variable. The degrees of freedom for VARMETHOD=BRR equals the number of strata; by default, that number is based on all valid observations in the data set. But if you specify this method-option, PROC SURVEYMEANS does not count any empty strata that are caused by all observations containing missing values for an analysis variable.

For more information, see the section Degrees of Freedom. For more information about valid observations, see the section Data and Sample Design Summary.

This method-option has no effect on categorical variables when you specify the MISSING option, which treats missing values as a valid nonmissing level.

FAY <=value>

requests Fay’s method, which is a modification of the BRR method. For more information, see the section Fay’s BRR Method.

You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient must be a nonnegative number less than 1. By default, the Fay coefficient is 0.5.

HADAMARD=SAS-data-set H=SAS-data-set

names a SAS-data-set that contains the Hadamard matrix for BRR replicate construction. If you do not specify this method-option, PROC SURVEYMEANS generates an appropriate Hadamard matrix for replicate construction. For more information, see the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix.

If a Hadamard matrix of a particular dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS-data-set in this method-option.

In this SAS-data-set, each variable corresponds to a column and each observation corresponds to a row of the Hadamard matrix. You can use any variable names in this data set. All values in the data set must equal either 1 or –1. You must ensure that the matrix you provide is indeed a Hadamard matrix—that is, , where is the Hadamard matrix of dimension R and is an identity matrix. PROC SURVEYMEANS does not check the validity of the Hadamard matrix that you provide.

The SAS-data-set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, PROC SURVEYMEANS uses only the first H variables. Similarly, this data set must contain at least H observations.

If you do not specify the REPS= method-option, the number of replicates is assumed to be the number of observations in the SAS-data-set. If you specify the number of replicates—for example, REPS=nreps—the first nreps observations in the SAS-data-set are used to construct the replicates.

You can specify the PRINTH method-option to display the Hadamard matrix that PROC SURVEYMEANS uses to construct replicates for BRR.

NAIVEQVAR

requests that naive replication variance estimates be used to estimate the variances for quantiles. For more information, see the section Replication Methods.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set in which to store the replicate weights that PROC SURVEYMEANS creates for BRR variance estimation. For information about replicate weights, see the section Balanced Repeated Replication (BRR) Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weights Output Data Set.

This method-option is not available when you provide replicate weights in a REPWEIGHTS statement.

PRINTH

displays the Hadamard matrix that PROC SURVEYMEANS uses to construct replicates for BRR variance estimation. When you provide the Hadamard matrix in the HADAMARD= method-option, PROC SURVEYMEANS displays only the rows and columns that are actually used to construct replicates. For more information, see the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix.

The PRINTH method-option is not available when you provide replicate weights in a REPWEIGHTS statement because the procedure does not use a Hadamard matrix in this case.

REPS=number

specifies the number of replicates for BRR variance estimation. The value of number must be an integer greater than 1.

If you do not use the HADAMARD= method-option to provide a Hadamard matrix, the number of replicates should be greater than the number of strata and should be a multiple of 4. For more information, see the section Balanced Repeated Replication (BRR) Method. If PROC SURVEYMEANS cannot construct a Hadamard matrix for the REPS= value that you specify, the value is increased until a Hadamard matrix of that dimension can be constructed. Therefore, the actual number of replicates that PROC SURVEYMEANS uses might be larger than number.

If you use the HADAMARD= method-option to provide a Hadamard matrix, the value of number must not be greater than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the REPS= method-option, the number of replicates is the number of rows in the Hadamard matrix.

If you do not specify the REPS= or the HADAMARD= method-option and do not use a REPWEIGHTS statement, the number of replicates is the smallest multiple of 4 that is greater than the number of strata.

If you use a REPWEIGHTS statement to provide replicate weights, PROC SURVEYMEANS does not use the REPS= method-option; the number of replicates is the number of REPWEIGHTS variables.

JACKKNIFE <(method-options)> JK <(method-options)>

requests variance estimation by the delete-1 jackknife method. For more information, see the section Jackknife Method. If you use a REPWEIGHTS statement to provide replicate weights, VARMETHOD=JACKKNIFE is the default variance estimation method.

The delete-1 jackknife method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you use a REPWEIGHTS statement to provide replicate weights.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the bootstrap method. You can specify the following values:

FULLSAMPLE: computes the deviations of the replicate estimates from the full sample estimate.
REPLICATES: computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Jackknife Method.

DFADJ

For more information, see the section Degrees of Freedom. For more information about valid observations, see the section Data and Sample Design Summary.

This method-option has no effect on categorical variables when you specify the MISSING option, which treats missing values as a valid nonmissing level.

NAIVEQVAR

requests that naive replication variance estimates be used to estimate the variances for quantiles. For more information, see the section Replication Methods.

OUTJKCOEFS=SAS-data-set

names a SAS-data-set in which to store the jackknife coefficients. For information about jackknife coefficients, see the section Jackknife Method. For information about the contents of the OUTJKCOEFS= data set, see the section Jackknife Coefficients Output Data Set.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set in which to store the replicate weights that PROC SURVEYMEANS creates for jackknife variance estimation. For information about replicate weights, see the section Jackknife Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weights Output Data Set.

This method-option is not available when you use a REPWEIGHTS statement to provide replicate weights unless you specify a POSTSTRATA statement.

TAYLOR

requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS statement. For more information, see the section Taylor Series Method.

Last updated: December 09, 2022