The SURVEYFREQ Procedure

Domain Analysis

You can perform domain analysis in PROC SURVEYFREQ by using multiway table requests, the ROW option, the COLUMN option, or the DOMAIN=ROW option. Domain analysis refers to the computation of statistics for domains (subpopulations) in addition to the computation of statistics for the entire study population. Formation of subpopulations can be unrelated to the sample design and thus the domain sample sizes can actually be random variables. Domain analysis takes this variability into account by using the entire sample to estimate the variance of domain estimates. Domain analysis is also known as subgroup analysis, subpopulation analysis, and subdomain analysis. For more information about domain analysis, see Lohr (2010), Cochran (1977), Fuller et al. (1989).

You can perform domain analysis by including the domain variable(s) in a multiway table request. For example, you can specify DOMAIN * A * B in a TABLES statement to produce separate two-way tables of A by B for each level of DOMAIN. If your domains are formed by more than one variable, you can specify DomainVariable_1 * DomainVariable_2 * A * B, for example, to obtain two-way tables of A by B for each domain formed by the combinations of levels of DomainVariable_1 and DomainVariable_2. For an example of domain analysis, see Example 118.2.

If you specify a two-way table request in a TABLES statement (for example, DOMAIN * A), the values of the variable DOMAIN form the table rows and the values of the variable A form the table columns. The two-way table displays the levels of the variable A within each level of the row variable DOMAIN. You can specify the ROW option in the TABLES statement to obtain the row percentages, standard errors, confidence limits, and other statistics. This provides the one-way distribution of A in each domain (level of the variable DOMAIN). Alternatively, you can display a separate one-way table for each row variable level by specifying the DOMAIN=ROW option. This option enables you to produce one-way chi-square tests for the row-level domains by also specifying the CHISQ option.

A domain analysis (where the variance computations are based on the entire sample) is not the same as the analysis that you obtain by using a BY statement or subsetting the input data set; a BY statement provides completely separate analyses of the BY groups. You can use a BY statement to analyze subgroups of the data, but it is critical to note that this does not produce a valid domain analysis; the BY statement is appropriate only when the number of units in each subgroup is known with certainty. For example, you can use a BY statement to obtain stratum level estimates when the stratum sample sizes are fixed. But when the subgroup sample sizes are not fixed, you should perform domain analysis by including the domain variables in the TABLES statement request.

Last updated: December 09, 2022