PROC SURVEYFREQ provides design effects for the overall proportion estimates when you specify the DEFF option in the TABLES statement. The procedure also uses design effects to compute Rao-Scott chi-square tests (CHISQ) and modified confidence limits for proportions (CL). For more information, see the sections Rao-Scott Chi-Square Test and Modified Confidence Limits.
The design effect for an estimate is the ratio of the actual variance (estimated based on the sample design) to the variance of a simple random sample (SRS) with the same number of observations. For more information, see Lohr (2010) and Kish (1965).
PROC SURVEYFREQ computes the design effect for the proportion in table cell (r, c) as
where is the estimate of the proportion in table cell (r, c);
is the variance of the estimate; n is the sample size (unweighted frequency) for the two-way table; and f is the overall sampling fraction, which is described in the section Sampling Fraction.
If you specify the DEFF(VARDEF=N) option in the PROC SURVEYFREQ statement, the procedure uses n instead of (n – 1) as the divisor in the SRS variance .
For Taylor series variance estimation, by default, PROC SURVEYFREQ includes the finite population correction (1 – f) in the SRS variance when you specify sampling rates or population totals in the RATE= or TOTAL= option, respectively. To exclude the finite population correction, you can specify the DEFF(FPC=NO) option in the PROC SURVEYFREQ statement.
For replication variance estimation, by default, PROC SURVEYFREQ does not include the finite population correction (1 – f) in the SRS variance. To include the finite population correction, you can specify the DEFF(FPC=YES) option in the PROC SURVEYFREQ statement and provide sampling rates or population totals in the RATE= or TOTAL= option, respectively.
PROC SURVEYFREQ computes design effects in the same way for proportions in one-way frequency tables and for row and column proportions in two-way tables. In these design effect computations, the value of n is the sample size (unweighted frequency) that corresponds to the total domain of the proportion estimate. For table cell proportions of a two-way table, the domain is the two-way table and the sample size n is the frequency of the two-way table. For row proportions, which are based on a two-way table row, the domain is the row and the sample size n is the row frequency. For column proportions, the sample size n is the column frequency.
PROC SURVEYFREQ determines the sampling fraction f for the design effect by using the sampling rates or population totals that you provide in the RATE= or TOTAL= option, respectively. If you omit both of these options, PROC SURVEYFREQ assumes that the sampling fraction f is negligible and does not include a finite population correction in the analysis, as described in the section Population Totals and Sampling Rates.
If you specify RATE=value, PROC SURVEYFREQ uses this value as the overall sampling fraction f. If you specify TOTAL=value, PROC SURVEYFREQ computes f as the ratio of the number of PSUs in the sample to the total value.
If you provide stratum sampling rates by using the RATE=SAS-data-set option, PROC SURVEYFREQ computes each stratum total as the number of sample PSUs in the stratum divided by the stratum sampling rate. Or you can provide the stratum totals by using the TOTAL=SAS-data-set option. The overall total is then computed as the sum of the stratum totals, and the overall sampling fraction f is computed as the ratio of the number of sample PSUs to the overall total.