The SURVEYMEANS Procedure

DOMAIN Statement

  • DOMAIN variables <variable*variable variable*variable*variable …> </ option>;

  • DOMAIN variable <(’formatted-level-value’ …’formatted-level-value’)> <variable <(’formatted-level-value’ …’formatted-level-value’)>*variable <(’formatted-level-value’ …’formatted-level-value’)> >;

The DOMAIN statement requests analysis for domains (subpopulations) in addition to analysis for the entire study population. The DOMAIN statement names the variables that identify domains, which are called domain variables.

A domain variable can be either character or numeric. The procedure treats domain variables as categorical variables. If a variable appears by itself in a DOMAIN statement, each level of this variable determines a domain in the study population. If two or more variables are joined by asterisks (*), then every possible combination of levels of these variables determines a domain. The procedure performs a descriptive analysis within each domain that is defined by the domain variables.

The formatted values of the domain variables determine the categorical variable levels. Thus, you can use formats to group values into levels. For more information, see the FORMAT procedure in Base SAS Procedures Guide and the FORMAT statement and SAS formats in SAS Formats and Informats: Reference.

When determining levels of a DOMAIN variable, an observation with missing values for this DOMAIN variable is excluded, unless you specify the MISSING option. For more information, see the section Missing Values.

It is common practice to compute statistics for domains. Because formation of these domains might be unrelated to the sample design, the sample sizes for the domains are random variables. Use a DOMAIN statement to incorporate this variability into the variance estimation.

A DOMAIN statement is different from a BY statement. In a BY statement, you treat the sample sizes as fixed in each subpopulation, and you perform analysis within each BY group independently. For more information, see the section Domain Analysis. Similarly, you should use a DOMAIN statement to perform a domain analysis over the entire data set. Creating a new data set from a single domain and analyzing that with PROC SURVEYMEANS yields inappropriate estimates of variance.

By default, the SURVEYMEANS procedure displays analyses for all levels of domains that are formed by the variables in a DOMAIN statement. Optionally, you can specify particular levels of each DOMAIN variable to be displayed by listing quoted formatted-level-values in parentheses after each variable name. You must enclose each formatted-level-value in single or double quotation marks. You can specify one or more levels of each variable; when you specify more than one level, separate the levels by a space or a comma. These examples illustrate the syntax:

domain Race*Gender(''Female'');
domain Race('White','Asian') Gender;

For example, Race*Gender(”Female”) requests that the procedure display analysis only for females within each race category, and Race(’White’,’Asian’) requests that the procedure display domain analysis only for people whose race is either white or Asian.

Specifying the same domain multiple times but with different levels for each corresponding domain variables is equivalent to specifying the union of different levels for the same variables. However, if you do not specify levels for a variable in a domain that is specified multiple times, only the specified levels are rendered. For example, the following two specifications together

domain Race('White')*Gender('Female');
domain Race('Asian')*Gender;

have the same effect as a single specification:

domain Race('White' 'Asian')*Gender('Female');

Also, the following specification

domain Race('White')*Gender Race('Asian')*Gender;

is equivalent to

domain Race('White' 'Asian')*Gender;

This syntax controls only the display of domain analysis results; it does not subset the data set, change the degrees of freedom, or otherwise affect the variance estimation.

You can specify the following options in the DOMAIN statement after a slash (/):

ADJUST=BON

requests a Bonferroni multiple comparison adjustment of the p-values, and adjusted confidence limits for the difference of domain means if the CLDIFF option is also specified. The adjusted p-values and confidence limits are displayed in addition to the unadjusted quantities. For a description of the adjustments, see the section p-Value Adjustments in Chapter 86, The MULTTEST Procedure.

This option also invokes the DIFFMEANS option.

CLDIFF

requests t type confidence limits for each difference of domain means. You can specify the confidence level alpha in the ALPHA= option in the PROC SURVEYMEANS statement. By default, alpha equals 0.05, which produces 95% confidence limits. If you specify the ADJUST=BON option, then the adjusted confidence limits for Bonferroni multiplicity are also displayed.

This option also invokes the DIFFMEANS option.

COV

displays the estimated covariance matrix of domain means.

DFADJ

computes the degrees of freedom by using the number of non-empty strata for an analysis variable in a domain.

In a domain analysis, it is possible that some strata contain no sampling units for a specific domain. Or some strata in the domain might be empty due to missing values. By default, the procedure counts these empty strata when computing the degrees of freedom.

However, if you specify the DFADJ option, the procedure excludes any empty strata when computing the degrees of freedom. Prior to SAS 9.2, the procedure excluded empty strata by default.

The DFADJ option has no effect on categorical variables when you specify the MISSING option, which treats missing values as a valid nonmissing level.

For more information about valid observations, see the section Data and Sample Design Summary. For more information about degrees of freedom, see the section Degrees of Freedom.

DIFFMEANS | DIFF

requests the comparison of domain means for each continuous analysis variable that you specify in the VAR statement; this option does not provide comparisons for categorical analysis variables. If you specify this option, the SURVEYMEANS procedure provides differences between domain means for pairwise levels of a defined domain. You can specify which variable levels to be included by listing the quoted formatted-level-values in parentheses after the variable names. By default, the SURVEYMEANS procedure includes all pairwise domain levels.

For each pair of domain levels, the procedure displays the difference between domain means, the standard error of the difference, and the t test. You can also specify the CLDIFF option to request confidence limits for the differences and the ADJUST=BON option to request a Bonferroni multiple comparison adjustment of the p-values and confidence limits.

For more information, see the section Difference of Domain Means.

Last updated: December 09, 2022