The SURVEYLOGISTIC Procedure

PROC SURVEYLOGISTIC Statement

  • PROC SURVEYLOGISTIC <options>;

The PROC SURVEYLOGISTIC statement invokes the SURVEYLOGISTIC procedure. Optionally, it identifies input data sets, controls the ordering of the response levels, and specifies the variance estimation method. The PROC SURVEYLOGISTIC statement is required.

Table 1 summarizes the options available in the PROC SURVEYLOGISTIC statement.

Table 1: PROC SURVEYLOGISTIC Statement Options

Option Description
ALPHA= Sets the confidence level for confidence intervals
DATA= Names the SAS data set containing the data to be analyzed
INEST= Names the SAS data set that contains initial estimates
MAXRESPONSELEVELS= Specifies the maximum number of response levels allowed
MISSING Treats missing values as a valid category
NAMELEN= Specifies the length of effect names
NOMCAR Treats missing values as not missing completely at random
NOSORT Suppresses the internal sorting process
ORDER= Specifies the sort order
RATE= Specifies the sampling rate
TOTAL= Specifies the total number of primary sampling units
VARMETHOD= Specifies the variance estimation method


ALPHA=value

sets the confidence level for confidence intervals. The value of the ALPHA= option must be between 0 and 1, and the default value is 0.05. A confidence level of alpha produces 100 left-parenthesis 1 minus alpha right-parenthesis% confidence intervals. The default of ALPHA=0.05 produces 95% confidence intervals.

DATA=SAS-data-set

names the SAS data set containing the data to be analyzed. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

INEST=SAS-data-set

names the SAS data set that contains initial estimates for all the parameters in the model. BY-group processing is allowed in setting up the INEST= data set. See the section INEST= Data Set for more information.

MAXRESPONSELEVELS=number

specifies the maximum number of response levels that are allowed in your data set. By default, MAXRESPONSELEVELS=100. If you have more response levels than the maximum number allowed, then a message is displayed in the SAS log that provides the value of number required to continue the analysis, and the procedure stops.

MISSING

treats missing values as a valid (nonmissing) category for all categorical variables, which include CLASS, STRATA, CLUSTER, and DOMAIN variables.

By default, if you do not specify the MISSING option, an observation is excluded from the analysis if it has a missing value. For more information, see the section Missing Values.

NAMELEN=n

specifies the length of effect names in tables and output data sets to be n characters, where n is a value between 20 and 200. The default length is 20 characters.

NOMCAR

requests that the procedure treat missing values in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYLOGISTIC computes variance estimates by analyzing the nonmissing values as a domain or subpopulation, where the entire population includes both nonmissing and missing domains. See the section Missing Values for more details.

By default, PROC SURVEYLOGISTIC completely excludes an observation from analysis if that observation has a missing value, unless you specify the MISSING option. Note that the NOMCAR option has no effect on a classification variable when you specify the MISSING option, which treats missing values as a valid nonmissing level.

The NOMCAR option applies only to Taylor series variance estimation; it is ignored for replication methods.

NOSORT

suppresses the internal sorting process to shorten the computation time if the data set is presorted by the STRATA and CLUSTER variables. By default, the procedure sorts the data by the STRATA variables if you use the STRATA statement; then the procedure sorts the data by the CLUSTER variables within strata. If your data are already stored by the order of STRATA and CLUSTER variables, then you can specify this option to omit this sorting process to reduce the usage of computing resources, especially when your data set is very large. However, if you specify this NOSORT option while your data are not presorted by STRATA and CLUSTER variables, then any changes in these variables creates a new stratum or cluster.

ORDER=DATA | FORMATTED | FREQ | INTERNAL

specifies the sort order for the levels of the response variable. This option, except for ORDER=FREQ, also determines the sort order for the levels of CLUSTER and DOMAIN variables and controls STRATA variable levels in the "Stratum Information" table. By default, ORDER=INTERNAL. However, if an ORDER= option is specified after the response variable, in the MODEL statement, it overrides this option for the response variable. This option does not affect the ordering of the CLASS variable levels; see the ORDER= option in the CLASS statement for more information.

RATE=value | SAS-data-set
R=value | SAS-data-set

specifies the sampling rate, which PROC SURVEYLOGISTIC uses to compute a finite population correction for Taylor series or bootstrap variance estimation. This option is ignored for the jackknife or balanced repeated replication (BRR) variance estimation method.

If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) in the sample to the total number of PSUs in the population.

You can specify the sampling rate in either of the following ways:

value

specifies a nonnegative number to use for a nonstratified design or for a stratified design that has the same sampling rate in each stratum.

SAS-data-set

specifies a SAS-data-set that contains the stratification variables and the sampling rates for a stratified design that has different sampling rates in the strata. You must provide the sampling rates in the data set variable named _RATE_. The sampling rates must be nonnegative numbers.

You can specify sampling rates as numbers between 0 and 1. Or you can specify sampling rates in percentage form as numbers between 1 and 100, which PROC SURVEYLOGISTIC converts to proportions. The procedure treats the value 1 as 100% instead of 1%.

For more information, see the section Specification of Population Totals and Sampling Rates.

If you do not specify either the RATE= or TOTAL= option, the Taylor series or bootstrap variance estimation does not include a finite population correction. You cannot specify both the RATE= and TOTAL= options.

TOTAL=value | SAS-data-set
N=value | SAS-data-set

specifies the total number of primary sampling units (PSUs) in the study population. PROC SURVEYLOGISTIC uses this information to compute a finite population correction for Taylor series or bootstrap variance estimation. This option is ignored for the jackknife or BRR variance estimation method.

You can specify the total number of PSUs in either of the following ways:

value

specifies a positive number to use for a nonstratified design or for a stratified design that has the same population total in each stratum.

SAS-data-set

specifies a SAS-data-set that contains the stratification variables and the population totals for a stratified design that has different population totals in the strata. You must provide the stratum totals in the data set variable named _TOTAL_. The stratum totals must be positive numbers.

For more information, see the section Specification of Population Totals and Sampling Rates.

If you do not specify either the TOTAL= or RATE= option, the Taylor series or bootstrap variance estimation does not include a finite population correction. You cannot specify both the TOTAL= and RATE= options.

VARMETHOD=method <(method-options)>

specifies the variance estimation method. PROC SURVEYLOGISTIC provides the Taylor series method and the following replication (resampling) methods: balanced repeated replication (BRR), bootstrap, and jackknife.

Table 2 summarizes the available methods and method-options.


For VARMETHOD=BOOTSTRAP, VARMETHOD=BRR, and VARMETHOD=JACKKNIFE, you can specify method-options in parentheses after the variance estimation method. For example:

varmethod=BRR(reps=60 outweights=myReplicateWeights)

By default, VARMETHOD=JACKKNIFE if you also specify a REPWEIGHTS statement; otherwise, VARMETHOD=TAYLOR by default.

You can specify the following methods:

BOOTSTRAP <(method-options)>

requests variance estimation by the bootstrap method. For more information, see the section Bootstrap Method.

The bootstrap method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you use a REPWEIGHTS statement to provide replicate weights.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the bootstrap method. You can specify the following values:

FULLSAMPLE

computes the deviations of the replicate estimates from the full sample estimate.

REPLICATES

computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Bootstrap Method.

MH=value(values) | SAS-data-set

specifies the number of PSUs to select for the bootstrap replicate samples. You can provide bootstrap stratum sample sizes m Subscript h by specifying a list of values or a SAS-data-set. Alternatively, you can provide a single bootstrap sample size value to use for all strata or for a nonstratified design. You can specify the number of replicate samples in the REPS= option. For more information, see the section Bootstrap Method.

Each bootstrap sample size m Subscript h must be a positive integer and must be less than n Subscript h, which is the total number of PSUs in stratum h. By default, m Subscript h = n Subscript h Baseline minus 1 for a stratified design. For a nonstratified design, the bootstrap sample size value must be less than n (the total number of PSUs in the sample). By default, m = n – 1 for a nonstratified design.

You can provide bootstrap sample sizes by specifying one of the following forms:

MH=value

specifies a single bootstrap sample size value to use for all strata or for a nonstratified design.

MH=(values)

specifies a list of stratum bootstrap sample size values. You can separate the values with blanks or commas, and you must enclose the list of values in parentheses. The number of values must not be less than the number of strata in the DATA= input data set.

Each stratum sample size value must be a positive integer and must be less than the total number of PSUs in the corresponding stratum.

MH=SAS-data-set

names a SAS-data-set that contains the stratum bootstrap sample sizes. You must provide the sample sizes in a data set variable named _NSIZE_ or SampleSize.

The SAS-data-set must contain all stratification variables that you specify in the STRATA statement. It must also contain all stratum levels that appear in the DATA= input data set. If formats are associated with the STRATA variables, the formats must be consistent in the two data sets.

Each value of the _NSIZE_ or SampleSize variable must be a positive integer and must be less than the total number of PSUs in the corresponding stratum.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set in which to store the bootstrap replicate weights that PROC SURVEYLOGISTIC creates. For information about replicate weights, see the section Bootstrap Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weights Output Data Set.

This method-option is not available when you provide replicate weights in a REPWEIGHTS statement.

REPS=number

specifies the number of replicates for bootstrap variance estimation. The value of number must be an integer greater than 1. Increasing the number of replicates improves the estimation precision but also increases the computation time. By default, REPS=250.

SEED=number

specifies the initial seed for random number generation for bootstrap replicate sampling.

If you do not specify this option or if you specify a number that is negative or 0, PROC SURVEYLOGISTIC uses the time of day from the system clock to obtain an initial seed.

To reproduce the same bootstrap replicate weights and the same analysis in a subsequent execution of PROC SURVEYLOGISTIC, you can specify the same initial seed that was used in the original analysis.

PROC SURVEYLOGISTIC displays the value of the initial seed in the "Variance Estimation" table.

BRR <(method-options)>

requests variance estimation by balanced repeated replication (BRR). This method requires a stratified sample design where each stratum contains two primary sampling units (PSUs). When you specify this method, you must also specify a STRATA statement unless you provide replicate weights by using the REPWEIGHTS statement. For more information, see the section Balanced Repeated Replication (BRR) Method.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the bootstrap method. You can specify the following values:

FULLSAMPLE

computes the deviations of the replicate estimates from the full sample estimate.

REPLICATES

computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Balanced Repeated Replication (BRR) Method.

FAY <=value>

requests Fay’s method, which is a modification of the BRR method. For more information, see the section Fay’s BRR Method.

You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient must be a nonnegative number less than 1. By default, the Fay coefficient is 0.5.

HADAMARD=SAS-data-set
H=SAS-data-set

names a SAS-data-set that contains the Hadamard matrix for BRR replicate construction. If you do not specify this method-option, PROC SURVEYLOGISTIC generates an appropriate Hadamard matrix for replicate construction. For more information, see the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix.

If a Hadamard matrix of a particular dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS-data-set in this method-option.

In this SAS-data-set, each variable corresponds to a column and each observation corresponds to a row of the Hadamard matrix. You can use any variable names in this data set. All values in the data set must equal either 1 or –1. You must ensure that the matrix you provide is indeed a Hadamard matrix—that is, bold upper A prime bold upper A equals upper R bold upper I, where bold upper A is the Hadamard matrix of dimension R and bold upper I is an identity matrix. PROC SURVEYLOGISTIC does not check the validity of the Hadamard matrix that you provide.

The SAS-data-set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, PROC SURVEYLOGISTIC uses only the first H variables. Similarly, this data set must contain at least H observations.

If you do not specify the REPS= method-option, the number of replicates is assumed to be the number of observations in the SAS-data-set. If you specify the number of replicates—for example, REPS=nreps—the first nreps observations in the SAS-data-set are used to construct the replicates.

You can specify the PRINTH method-option to display the Hadamard matrix that PROC SURVEYLOGISTIC uses to construct replicates for BRR.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set in which to store the replicate weights that PROC SURVEYLOGISTIC creates for BRR variance estimation. For information about replicate weights, see the section Balanced Repeated Replication (BRR) Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weights Output Data Set.

This method-option is not available when you provide replicate weights in a REPWEIGHTS statement.

PRINTH

displays the Hadamard matrix that PROC SURVEYLOGISTIC uses to construct replicates for BRR variance estimation. When you provide the Hadamard matrix in the HADAMARD= method-option, PROC SURVEYLOGISTIC displays only the rows and columns that are actually used to construct replicates. For more information, see the sections Balanced Repeated Replication (BRR) Method and Hadamard Matrix.

The PRINTH method-option is not available when you provide replicate weights in a REPWEIGHTS statement because the procedure does not use a Hadamard matrix in this case.

REPS=number

specifies the number of replicates for BRR variance estimation. The value of number must be an integer greater than 1.

If you do not use the HADAMARD= method-option to provide a Hadamard matrix, the number of replicates should be greater than the number of strata and should be a multiple of 4. For more information, see the section Balanced Repeated Replication (BRR) Method. If PROC SURVEYLOGISTIC cannot construct a Hadamard matrix for the REPS= value that you specify, the value is increased until a Hadamard matrix of that dimension can be constructed. Therefore, the actual number of replicates that PROC SURVEYLOGISTIC uses might be larger than number.

If you use the HADAMARD= method-option to provide a Hadamard matrix, the value of number must not be greater than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the REPS= method-option, the number of replicates is the number of rows in the Hadamard matrix.

If you do not specify the REPS= or the HADAMARD= method-option and do not use a REPWEIGHTS statement, the number of replicates is the smallest multiple of 4 that is greater than the number of strata.

If you use a REPWEIGHTS statement to provide replicate weights, PROC SURVEYLOGISTIC does not use the REPS= method-option; the number of replicates is the number of REPWEIGHTS variables.

JACKKNIFE <(method-options)>
JK <(method-options)>

requests variance estimation by the delete-1 jackknife method. For more information, see the section Jackknife Method. If you use a REPWEIGHTS statement to provide replicate weights, VARMETHOD=JACKKNIFE is the default variance estimation method.

The delete-1 jackknife method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you use a REPWEIGHTS statement to provide replicate weights.

You can specify the following method-options:

CENTER=FULLSAMPLE | REPLICATES

defines how to compute the deviations for the bootstrap method. You can specify the following values:

FULLSAMPLE

computes the deviations of the replicate estimates from the full sample estimate.

REPLICATES

computes the deviations of the replicate estimates from the average of the replicate estimates.

By default, CENTER=FULLSAMPLE. For more information, see the section Jackknife Method.

OUTJKCOEFS=SAS-data-set

names a SAS-data-set in which to store the jackknife coefficients. For information about jackknife coefficients, see the section Jackknife Method. For information about the contents of the OUTJKCOEFS= data set, see the section Jackknife Coefficients Output Data Set.

OUTWEIGHTS=SAS-data-set

names a SAS-data-set in which to store the replicate weights that PROC SURVEYLOGISTIC creates for jackknife variance estimation. For information about replicate weights, see the section Jackknife Method. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weights Output Data Set.

This method-option is not available when you use a REPWEIGHTS statement to provide replicate weights.

TAYLOR

requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS statement. For more information, see the section Taylor Series (Linearization).

Last updated: December 09, 2022