The SURVEYFREQ Procedure

Definitions and Notation

For a stratified clustered sample design, define the following:

StartLayout 1st Row 1st Column h 2nd Column equals 3rd Column 1 comma 2 comma ellipsis comma upper H 4th Column is the stratum number comma 2nd Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column with a total of upper H strata 3rd Row 1st Column i 2nd Column equals 3rd Column 1 comma 2 comma ellipsis comma n Subscript h Baseline 4th Column is the cluster number within stratum h comma 4th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column with a total of n Subscript h Baseline sample clusters in stratum h 5th Row 1st Column j 2nd Column equals 3rd Column 1 comma 2 comma ellipsis comma m Subscript h i Baseline 4th Column is the unit number within cluster i of stratum h 6th Row 1st Column Blank 2nd Column Blank 3rd Column Blank 4th Column with a total of m Subscript h i Baseline sample units from cluster i of stratum h 7th Row 1st Column n 2nd Column equals 3rd Column sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Endscripts m Subscript h i 4th Column is the total number of observations in the sample EndLayout

and

StartLayout 1st Row 1st Column f Subscript h 2nd Column equals 3rd Column first hyphen stage sampling rate for stratum h 2nd Row 1st Column upper W Subscript h i j 2nd Column equals 3rd Column sampling weight of unit j in cluster i of stratum h EndLayout

The sampling rate f Subscript h, which is used in Taylor series and bootstrap variance estimation, is the fraction of first-stage units (PSUs) selected for the sample. You can specify the stratum sampling rates in the RATE= option. Or you can specify the stratum population totals in the TOTAL= option, and PROC SURVEYFREQ computes the f Subscript h as the ratio of stratum sample sizes (PSUs) to stratum totals. For more information, see the section Population Totals and Sampling Rates. If you do not specify the RATE= option or TOTAL= option, the procedure assumes that the stratum sampling rates f Subscript h are negligible and does not use a finite population correction in variance computation.

This notation is also applicable to other sample designs. For example, for a design without stratification, you can let H = 1; for a sample design without clustering, you can let m Subscript h i Baseline equals 1 for every h and i, which replaces clusters with individual sampling units.

For a two-way table representing the crosstabulation of two variables, define the following, where there are R levels of the row variable and C levels of the column variable:

StartLayout 1st Row 1st Column r 2nd Column equals 3rd Column 1 comma 2 comma ellipsis comma upper R 4th Column is the row number comma with a total of upper R rows 2nd Row 1st Column c 2nd Column equals 3rd Column 1 comma 2 comma ellipsis comma upper C 4th Column is the column number comma with a total of upper C columns 3rd Row 1st Column upper N Subscript r c 2nd Column Blank 3rd Column Blank 4th Column is the population total in row r and column c 4th Row 1st Column upper N Subscript r dot 2nd Column equals 3rd Column sigma-summation Underscript c equals 1 Overscript upper C Endscripts upper N Subscript r c 4th Column is the total in row r 5th Row 1st Column upper N Subscript dot c 2nd Column equals 3rd Column sigma-summation Underscript r equals 1 Overscript upper R Endscripts upper N Subscript r c 4th Column is the total in column c 6th Row 1st Column upper N 2nd Column equals 3rd Column sigma-summation Underscript r equals 1 Overscript upper R Endscripts sigma-summation Underscript c equals 1 Overscript upper C Endscripts upper N Subscript r c 4th Column is the overall total EndLayout
StartLayout 1st Row 1st Column upper P Subscript r c 2nd Column equals 3rd Column upper N Subscript r c Baseline slash upper N 4th Column is the population proportion in row r and column c 2nd Row 1st Column upper P Subscript r period 2nd Column equals 3rd Column upper N Subscript r dot Baseline slash upper N 4th Column is the proportion in row r 3rd Row 1st Column upper P Subscript period c 2nd Column equals 3rd Column upper N Subscript dot c Baseline slash upper N 4th Column is the proportion in column c 4th Row 1st Column upper P Subscript r c Superscript r 2nd Column equals 3rd Column upper N Subscript r c Baseline slash upper N Subscript r dot Baseline 4th Column is the row proportion for table cell left-parenthesis r comma c right-parenthesis 5th Row 1st Column upper P Subscript r c Superscript c 2nd Column equals 3rd Column upper N Subscript r c Baseline slash upper N Subscript dot c Baseline 4th Column is the column proportion for table cell left-parenthesis r comma c right-parenthesis EndLayout

For a specified observation (identified by stratum, cluster, and unit number within the cluster), define the following to indicate whether or not that observation belongs to cell (r, c), row r and column c, of the two-way table, for r equals 1 comma 2 comma ellipsis comma upper R and c equals 1 comma 2 comma ellipsis comma upper C:

delta Subscript h i j Baseline left-parenthesis r comma c right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column Blank 3rd Column if observation left-parenthesis h i j right-parenthesis is in cell left-parenthesis r comma c right-parenthesis 2nd Row 1st Column 0 2nd Column Blank 3rd Column otherwise EndLayout

Similarly, define the following functions to indicate the observation’s row and column classification:

delta Subscript h i j Baseline left-parenthesis r dot right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column Blank 3rd Column if observation left-parenthesis h i j right-parenthesis is in row r 2nd Row 1st Column 0 2nd Column Blank 3rd Column otherwise EndLayout
delta Subscript h i j Baseline left-parenthesis dot c right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column Blank 3rd Column if observation left-parenthesis h i j right-parenthesis is in column c 2nd Row 1st Column 0 2nd Column Blank 3rd Column otherwise EndLayout
Last updated: December 09, 2022