If you specify the CL option in the TABLES statement, PROC SURVEYFREQ computes confidence limits for the proportions in the frequency and crosstabulation tables. The procedure provides Wald-type confidence limits, logit confidence limits, and the following modified confidence limits: Agresti-Coull, Clopper-Pearson (exact), Jeffreys, uniform, and Wilson (score). By default, PROC SURVEYFREQ computes Wald-type ("linear") confidence limits by using the variance estimates that are based on the sample design.
For information about design-based confidence limits for proportions (including comparisons of their performance), see Korn and Graubard (1999), Korn and Graubard (1998), Franco et al. (2019), Curtin et al. (2006), and Sukasih and Jang (2005). For more information about binomial confidence limits, see Brown, Cai, and DasGupta (2001) and Agresti and Coull (1998), in addition to the references cited in the following sections.
PROC SURVEYFREQ provides the option to compute an alternative confidence limit type for extreme (small and large) proportions or for small-frequency table cells. For more information, see Curtin et al. (2006). If you specify the PSMALL=p cl-option, PROC SURVEYFREQ computes the alternative confidence limit type when the proportion estimate is less than or equal to p or greater than or equal to (1 – p). When the proportion estimate is between p and (1 – p), PROC SURVEYFREQ computes Wald confidence limits. If you specify the NSMALL=n cl-option, PROC SURVEYFREQ computes the alternative confidence limit type when the table cell frequency is less than or equal to n.
For each table request, PROC SURVEYFREQ produces a nondisplayed ODS table, "Table Summary," which contains the number of observations, strata, and clusters that are included in the analysis of the requested table. When you request confidence limits, the "Table Summary" data set also contains the degrees of freedom df and the corresponding value of that is used to compute the confidence limits. For more information about this output data set, see Example 118.3.
By default, PROC SURVEYFREQ computes Wald-type ("linear") confidence limits for proportions. These confidence limits use variance estimates that are based on the sample design. For the proportion in table cell (r, c), the Wald confidence limits are computed as
where is the estimate of the proportion in table cell (r, c),
is the standard error of the estimate, and
is the
th percentile of the t distribution with df degrees of freedom. (For more information, see the section Degrees of Freedom.) The confidence level
is determined by the value of the ALPHA= option; by default, ALPHA=0.05, which produces 95% confidence limits.
The confidence limits for row proportions and column proportions are computed similarly to the confidence limits for table cell proportions.
If you specify the CL(TYPE=LOGIT) option, PROC SURVEYFREQ computes logit confidence limits for proportions. For more information, see Agresti (2013) and Korn and Graubard (1998).
Logit confidence limits for proportions are based on the logit transformation . The logit confidence limits
and
are computed as
where
where is the estimate of the proportion,
is the standard error of the estimate, and
is the
th percentile of the t distribution with df degrees of freedom. (For more information, see the section Degrees of Freedom.) The confidence level
is determined by the value of the ALPHA= option; by default, ALPHA=0.05, which produces 95% confidence limits.
PROC SURVEYFREQ uses the Korn and Graubard (1998) method to compute the following design-based modified confidence limits: Agresti-Coull, Clopper-Pearson (exact), Jeffreys, uniform, and Wilson (score). This method substitutes the degrees-of-freedom-adjusted effective sample size for the original sample size in the confidence limit computations. For more information, see Franco et al. (2019) and Dean and Pagano (2015).
The effective sample size is computed as
where n is the original sample size (unweighted frequency) that corresponds to the total domain of the proportion estimate, and is the design effect.
If the proportion is computed for a table cell of a two-way table, then the domain is the two-way table, and the sample size n is the frequency of the two-way table. If the proportion is a row proportion, which is based on a two-way table row, then the domain is the row, and the sample size n is the frequency of the row.
The design effect for an estimate is the ratio of the actual variance (estimated based on the sample design) to the variance of a simple random sample with the same number of observations. For more information, see the section Design Effect.
By default, PROC SURVEYFREQ uses (n – 1) as the divisor in the SRS component of the design effect. To use n as the divisor for the design effects in the confidence limit computations, you can specify the CL(VARDEF=N) option in the TABLES statement. To use n as the divisor for all design effects that PROC SURVEYFREQ computes, you can specify the DEFF(VARDEF=N) option in the PROC SURVEYFREQ statement.
The adjusted effective sample size is computed by applying a degrees-of-freedom adjustment to the effective sample size
. By default, PROC SURVEYFREQ uses the Korn and Graubard (1998) adjustment factor and computes the adjusted sample size as
where df is the degrees of freedom and is the
th percentile of the t distribution with df degrees of freedom. The degrees of freedom depend on the sample design and the variance estimation method. For more information, see the section Degrees of Freedom. The confidence level
is determined by the value of the ALPHA= option; by default, ALPHA=0.05, which produces 95% confidence limits.
If you specify the CL(ADJUST=DP) option, PROC SURVEYFREQ uses the Dean and Pagano (2015) adjustment factor, which replaces the t quantile in the numerator with a normal quantile and computes the adjusted effective sample size as
where is the
th percentile of the standard normal distribution.
If you specify the CL(ADJUST=NO) option, PROC SURVEYFREQ omits the adjustment and uses the (unadjusted) effective sample size instead of
in the confidence limit computations.
The design effect is usually greater than 1 for complex survey designs, and in this case the effective sample size is less than the actual sample size. If the adjusted effective sample size is greater than the actual sample size n, then by default, PROC SURVEYFREQ truncates the value of
to n, as recommended by Korn and Graubard (1998). If you specify the CL(TRUNCATE=NO) option, PROC SURVEYFREQ does not truncate the value of
if it exceeds n.
The modified Agresti-Coull confidence limits are constructed by applying the Korn and Graubard (1998) method to the Agresti-Coull form (Agresti and Coull 1998; Brown, Cai, and DasGupta 2001). The adjusted effective sample size is substituted for the sample size, and the adjusted effective sample size times the proportion estimate (
) is substituted for the number of positive responses. For more information, see Franco et al. (2019).
PROC SURVEYFREQ computes modified Agresti-Coull confidence limits for the proportion as
where
Clopper-Pearson (exact) confidence limits for the binomial proportion are constructed by inverting the exact equal-tailed test that is based on the binomial distribution. This method is attributed to Clopper and Pearson (1934). For a derivation of the F distribution expression for the confidence limits, see Leemis and Trivedi (1996).
Modified Clopper-Pearson confidence limits are constructed by using the Korn and Graubard (1998) method. The adjusted effective sample size is substituted for the sample size in the standard Clopper-Pearson expression, and the proportion estimate times the adjusted effective sample size (
) is substituted for the number of positive responses.
The modified Clopper-Pearson confidence limits for a proportion ( and
) are computed as
where is the
th percentile of the F distribution with b and c degrees of freedom,
is the adjusted effective sample size, and
is the design-based proportion estimate.
The Jeffreys confidence interval is an equal-tailed interval that is based on the noninformative Jeffreys prior for a binomial proportion. For more information, see Brown, Cai, and DasGupta (2001) and Berger (1985).
PROC SURVEYFREQ computes modified Jeffreys confidence limits for a proportion as
where and
is the
th percentile of the beta distribution with shape parameters b and c. The lower confidence limit is set to 0 when
, and the upper confidence limit is set to 1 when
. For more information, see Franco et al. (2019) and Carlin and Louis (2009).
PROC SURVEYFREQ computes modified uniform confidence limits for a proportion as
where and
is the
th percentile of the beta distribution with shape parameters b and c. The lower confidence limit is set to 0 when
, and the upper confidence limit is set to 1 when
. For more information, see Franco et al. (2019) and Carlin and Louis (2009).
Wilson confidence limits for the binomial proportion are also known as score confidence limits and are attributed to Wilson (1927). The confidence limits are based on inverting the normal test that uses the null proportion in the variance (the score test). For more information, see Agresti and Coull (1998), Newcombe (1998), and Brown, Cai, and DasGupta (2001).
PROC SURVEYFREQ computes modified Wilson confidence limits by substituting the adjusted effective sample size for the original sample size in the standard Wilson computation. For more information, see Korn and Graubard (1999).
The modified Wilson confidence limits for a proportion are computed as
where is the adjusted effective sample size and
is the design-based estimate of the proportion. By default,
. If you specify the CL(ADJUST=NO) option to use the unadjusted effective sample size
instead of
, then
. For more information, see Curtin et al. (2006).