The SURVEYFREQ Procedure

Confidence Limits for Proportions

If you specify the CL option in the TABLES statement, PROC SURVEYFREQ computes confidence limits for the proportions in the frequency and crosstabulation tables. The procedure provides Wald-type confidence limits, logit confidence limits, and the following modified confidence limits: Agresti-Coull, Clopper-Pearson (exact), Jeffreys, uniform, and Wilson (score). By default, PROC SURVEYFREQ computes Wald-type ("linear") confidence limits by using the variance estimates that are based on the sample design.

For information about design-based confidence limits for proportions (including comparisons of their performance), see Korn and Graubard (1999), Korn and Graubard (1998), Franco et al. (2019), Curtin et al. (2006), and Sukasih and Jang (2005). For more information about binomial confidence limits, see Brown, Cai, and DasGupta (2001) and Agresti and Coull (1998), in addition to the references cited in the following sections.

PROC SURVEYFREQ provides the option to compute an alternative confidence limit type for extreme (small and large) proportions or for small-frequency table cells. For more information, see Curtin et al. (2006). If you specify the PSMALL=p cl-option, PROC SURVEYFREQ computes the alternative confidence limit type when the proportion estimate is less than or equal to p or greater than or equal to (1 – p). When the proportion estimate is between p and (1 – p), PROC SURVEYFREQ computes Wald confidence limits. If you specify the NSMALL=n cl-option, PROC SURVEYFREQ computes the alternative confidence limit type when the table cell frequency is less than or equal to n.

For each table request, PROC SURVEYFREQ produces a nondisplayed ODS table, "Table Summary," which contains the number of observations, strata, and clusters that are included in the analysis of the requested table. When you request confidence limits, the "Table Summary" data set also contains the degrees of freedom df and the corresponding value of t Subscript d f comma alpha slash 2 that is used to compute the confidence limits. For more information about this output data set, see Example 118.3.

Wald Confidence Limits

By default, PROC SURVEYFREQ computes Wald-type ("linear") confidence limits for proportions. These confidence limits use variance estimates that are based on the sample design. For the proportion in table cell (r, c), the Wald confidence limits are computed as

ModifyingAbove upper P With caret Subscript r c Baseline plus-or-minus left-parenthesis t Subscript d f comma alpha slash 2 Baseline times normal upper S normal t normal d normal upper E normal r normal r left-parenthesis ModifyingAbove upper P With caret Subscript r c Baseline right-parenthesis right-parenthesis

where ModifyingAbove upper P With caret Subscript r c is the estimate of the proportion in table cell (r, c), normal upper S normal t normal d normal upper E normal r normal r left-parenthesis ModifyingAbove upper P With caret Subscript r c Baseline right-parenthesis is the standard error of the estimate, and t Subscript d f comma alpha slash 2 is the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the t distribution with df degrees of freedom. (For more information, see the section Degrees of Freedom.) The confidence level alpha is determined by the value of the ALPHA= option; by default, ALPHA=0.05, which produces 95% confidence limits.

The confidence limits for row proportions and column proportions are computed similarly to the confidence limits for table cell proportions.

Logit Confidence Limits

If you specify the CL(TYPE=LOGIT) option, PROC SURVEYFREQ computes logit confidence limits for proportions. For more information, see Agresti (2013) and Korn and Graubard (1998).

Logit confidence limits for proportions are based on the logit transformation upper Y equals log left-parenthesis ModifyingAbove p With caret slash left-parenthesis 1 minus ModifyingAbove p With caret right-parenthesis right-parenthesis. The logit confidence limits upper P Subscript upper L and upper P Subscript upper U are computed as

StartLayout 1st Row 1st Column upper P Subscript upper L 2nd Column equals 3rd Column exp left-parenthesis upper Y Subscript upper L Baseline right-parenthesis slash left-parenthesis 1 plus exp left-parenthesis upper Y Subscript upper L Baseline right-parenthesis right-parenthesis 2nd Row 1st Column upper P Subscript upper U 2nd Column equals 3rd Column exp left-parenthesis upper Y Subscript upper U Baseline right-parenthesis slash left-parenthesis 1 plus exp left-parenthesis upper Y Subscript upper U Baseline right-parenthesis right-parenthesis EndLayout

where

left-parenthesis upper Y Subscript upper L Baseline comma upper Y Subscript upper U Baseline right-parenthesis equals log left-parenthesis ModifyingAbove p With caret slash left-parenthesis 1 minus ModifyingAbove p With caret right-parenthesis right-parenthesis plus-or-minus left-parenthesis t Subscript d f comma alpha slash 2 Baseline times normal upper S normal t normal d normal upper E normal r normal r left-parenthesis ModifyingAbove p With caret right-parenthesis slash left-parenthesis ModifyingAbove p With caret left-parenthesis 1 minus ModifyingAbove p With caret right-parenthesis right-parenthesis right-parenthesis

where ModifyingAbove p With caret is the estimate of the proportion, normal upper S normal t normal d normal upper E normal r normal r left-parenthesis ModifyingAbove p With caret right-parenthesis is the standard error of the estimate, and t Subscript d f comma alpha slash 2 is the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the t distribution with df degrees of freedom. (For more information, see the section Degrees of Freedom.) The confidence level alpha is determined by the value of the ALPHA= option; by default, ALPHA=0.05, which produces 95% confidence limits.

Modified Confidence Limits

PROC SURVEYFREQ uses the Korn and Graubard (1998) method to compute the following design-based modified confidence limits: Agresti-Coull, Clopper-Pearson (exact), Jeffreys, uniform, and Wilson (score). This method substitutes the degrees-of-freedom-adjusted effective sample size for the original sample size in the confidence limit computations. For more information, see Franco et al. (2019) and Dean and Pagano (2015).

Effective Sample Size

The effective sample size n Subscript e is computed as

n Subscript e Baseline equals n slash normal upper D normal e normal f normal f

where n is the original sample size (unweighted frequency) that corresponds to the total domain of the proportion estimate, and normal upper D normal e normal f normal f is the design effect.

If the proportion is computed for a table cell of a two-way table, then the domain is the two-way table, and the sample size n is the frequency of the two-way table. If the proportion is a row proportion, which is based on a two-way table row, then the domain is the row, and the sample size n is the frequency of the row.

The design effect for an estimate is the ratio of the actual variance (estimated based on the sample design) to the variance of a simple random sample with the same number of observations. For more information, see the section Design Effect.

By default, PROC SURVEYFREQ uses (n – 1) as the divisor in the SRS component of the design effect. To use n as the divisor for the design effects in the confidence limit computations, you can specify the CL(VARDEF=N) option in the TABLES statement. To use n as the divisor for all design effects that PROC SURVEYFREQ computes, you can specify the DEFF(VARDEF=N) option in the PROC SURVEYFREQ statement.

Degrees-of-Freedom Adjustment

The adjusted effective sample size n Subscript e Superscript asterisk is computed by applying a degrees-of-freedom adjustment to the effective sample size n Subscript e. By default, PROC SURVEYFREQ uses the Korn and Graubard (1998) adjustment factor and computes the adjusted sample size as

n Subscript e Superscript asterisk Baseline equals n Subscript e Baseline left-parenthesis StartFraction t Subscript italic left-parenthesis n minus italic 1 italic right-parenthesis comma alpha slash 2 Baseline Over t Subscript d f comma alpha slash 2 Baseline EndFraction right-parenthesis squared

where df is the degrees of freedom and t Subscript d f comma alpha slash 2 is the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the t distribution with df degrees of freedom. The degrees of freedom depend on the sample design and the variance estimation method. For more information, see the section Degrees of Freedom. The confidence level alpha is determined by the value of the ALPHA= option; by default, ALPHA=0.05, which produces 95% confidence limits.

If you specify the CL(ADJUST=DP) option, PROC SURVEYFREQ uses the Dean and Pagano (2015) adjustment factor, which replaces the t quantile in the numerator with a normal quantile and computes the adjusted effective sample size as

n Subscript e Superscript asterisk Baseline equals n Subscript e Baseline left-parenthesis StartFraction z Subscript alpha slash 2 Baseline Over t Subscript d f comma alpha slash 2 Baseline EndFraction right-parenthesis squared

where z Subscript alpha slash 2 is the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the standard normal distribution.

If you specify the CL(ADJUST=NO) option, PROC SURVEYFREQ omits the adjustment and uses the (unadjusted) effective sample size n Subscript e instead of n Subscript e Superscript asterisk in the confidence limit computations.

Truncation

The design effect is usually greater than 1 for complex survey designs, and in this case the effective sample size is less than the actual sample size. If the adjusted effective sample size n Subscript e Superscript asterisk is greater than the actual sample size n, then by default, PROC SURVEYFREQ truncates the value of n Subscript e Superscript asterisk to n, as recommended by Korn and Graubard (1998). If you specify the CL(TRUNCATE=NO) option, PROC SURVEYFREQ does not truncate the value of n Subscript e Superscript asterisk if it exceeds n.

Modified Agresti-Coull Confidence Limits

The modified Agresti-Coull confidence limits are constructed by applying the Korn and Graubard (1998) method to the Agresti-Coull form (Agresti and Coull 1998; Brown, Cai, and DasGupta 2001). The adjusted effective sample size n Subscript e Superscript asterisk is substituted for the sample size, and the adjusted effective sample size times the proportion estimate (n Subscript e Superscript asterisk Baseline times ModifyingAbove p With caret) is substituted for the number of positive responses. For more information, see Franco et al. (2019).

PROC SURVEYFREQ computes modified Agresti-Coull confidence limits for the proportion as

p overTilde plus-or-minus left-parenthesis z Subscript alpha slash 2 Baseline times StartRoot p overTilde left-parenthesis 1 minus p overTilde right-parenthesis slash n overTilde Subscript e Superscript asterisk Baseline EndRoot right-parenthesis

where

StartLayout 1st Row 1st Column n overTilde Subscript 1 2nd Column equals 3rd Column left-parenthesis n Subscript e Superscript asterisk Baseline times ModifyingAbove p With caret right-parenthesis plus z Subscript alpha slash 2 Superscript 2 slash 2 2nd Row 1st Column n overTilde 2nd Column equals 3rd Column n Subscript e Superscript asterisk Baseline plus z Subscript alpha slash 2 Superscript 2 3rd Row 1st Column p overTilde 2nd Column equals 3rd Column n overTilde Subscript 1 Baseline slash n overTilde EndLayout

and ModifyingAbove p With caret is the design-based proportion estimate.

Modified Clopper-Pearson Confidence Limits

Clopper-Pearson (exact) confidence limits for the binomial proportion are constructed by inverting the exact equal-tailed test that is based on the binomial distribution. This method is attributed to Clopper and Pearson (1934). For a derivation of the F distribution expression for the confidence limits, see Leemis and Trivedi (1996).

Modified Clopper-Pearson confidence limits are constructed by using the Korn and Graubard (1998) method. The adjusted effective sample size n Subscript e Superscript asterisk is substituted for the sample size in the standard Clopper-Pearson expression, and the proportion estimate times the adjusted effective sample size (ModifyingAbove p With caret times n Subscript e Superscript asterisk) is substituted for the number of positive responses.

The modified Clopper-Pearson confidence limits for a proportion (upper P Subscript upper L and upper P Subscript upper U) are computed as

StartLayout 1st Row 1st Column upper P Subscript upper L 2nd Column equals 3rd Column left-parenthesis 1 plus StartFraction n Subscript e Superscript asterisk Baseline minus ModifyingAbove p With caret n Subscript e Superscript asterisk Baseline plus 1 Over ModifyingAbove p With caret n Subscript e Superscript asterisk Baseline upper F left-parenthesis alpha slash 2 comma 2 ModifyingAbove p With caret n Subscript e Superscript asterisk Baseline comma 2 left-parenthesis n Subscript e Superscript asterisk Baseline minus ModifyingAbove p With caret n Subscript e Superscript asterisk Baseline plus 1 right-parenthesis right-parenthesis EndFraction right-parenthesis Superscript negative 1 2nd Row 1st Column upper P Subscript upper U 2nd Column equals 3rd Column left-parenthesis 1 plus StartFraction n Subscript e Superscript asterisk Baseline minus ModifyingAbove p With caret n Subscript e Superscript asterisk Baseline Over left-parenthesis ModifyingAbove p With caret n Subscript e Superscript asterisk Baseline plus 1 right-parenthesis upper F left-parenthesis 1 minus alpha slash 2 comma 2 left-parenthesis ModifyingAbove p With caret n Subscript e Superscript asterisk Baseline plus 1 right-parenthesis comma 2 left-parenthesis n Subscript e Superscript asterisk Baseline minus ModifyingAbove p With caret n Subscript e Superscript asterisk Baseline right-parenthesis right-parenthesis EndFraction right-parenthesis Superscript negative 1 EndLayout

where upper F left-parenthesis alpha slash 2 comma b comma c right-parenthesis is the left-parenthesis alpha slash 2 right-parenthesisth percentile of the F distribution with b and c degrees of freedom, n Subscript e Superscript asterisk is the adjusted effective sample size, and ModifyingAbove p With caret is the design-based proportion estimate.

Modified Jeffreys Confidence Limits

The Jeffreys confidence interval is an equal-tailed interval that is based on the noninformative Jeffreys prior for a binomial proportion. For more information, see Brown, Cai, and DasGupta (2001) and Berger (1985).

PROC SURVEYFREQ computes modified Jeffreys confidence limits for a proportion as

left-parenthesis beta left-parenthesis alpha slash 2 comma n 1 plus 1 slash 2 comma n Subscript e Superscript asterisk Baseline minus n 1 plus 1 slash 2 right-parenthesis comma beta left-parenthesis 1 minus alpha slash 2 comma n 1 plus 1 slash 2 comma n Subscript e Superscript asterisk Baseline minus n 1 plus 1 slash 2 right-parenthesis right-parenthesis

where n 1 equals n Subscript e Superscript asterisk Baseline times ModifyingAbove p With caret and beta left-parenthesis alpha comma b comma c right-parenthesis is the alphath percentile of the beta distribution with shape parameters b and c. The lower confidence limit is set to 0 when n 1 equals 0, and the upper confidence limit is set to 1 when n 1 equals n. For more information, see Franco et al. (2019) and Carlin and Louis (2009).

Modified Uniform Confidence Limits

PROC SURVEYFREQ computes modified uniform confidence limits for a proportion as

left-parenthesis beta left-parenthesis alpha slash 2 comma n 1 plus 1 comma n Subscript e Superscript asterisk Baseline minus n 1 plus 1 right-parenthesis comma beta left-parenthesis 1 minus alpha slash 2 comma n 1 plus 1 comma n Subscript e Superscript asterisk Baseline minus n 1 plus 1 right-parenthesis right-parenthesis

where n 1 equals n Subscript e Superscript asterisk Baseline times ModifyingAbove p With caret and beta left-parenthesis alpha comma b comma c right-parenthesis is the alphath percentile of the beta distribution with shape parameters b and c. The lower confidence limit is set to 0 when n 1 equals 0, and the upper confidence limit is set to 1 when n 1 equals n. For more information, see Franco et al. (2019) and Carlin and Louis (2009).

Modified Wilson Confidence Limits

Wilson confidence limits for the binomial proportion are also known as score confidence limits and are attributed to Wilson (1927). The confidence limits are based on inverting the normal test that uses the null proportion in the variance (the score test). For more information, see Agresti and Coull (1998), Newcombe (1998), and Brown, Cai, and DasGupta (2001).

PROC SURVEYFREQ computes modified Wilson confidence limits by substituting the adjusted effective sample size n Subscript e Superscript asterisk for the original sample size in the standard Wilson computation. For more information, see Korn and Graubard (1999).

The modified Wilson confidence limits for a proportion are computed as

left-parenthesis ModifyingAbove p With caret plus kappa squared slash 2 n Subscript e Superscript asterisk Baseline plus-or-minus kappa StartRoot left-parenthesis ModifyingAbove p With caret left-parenthesis 1 minus ModifyingAbove p With caret right-parenthesis plus kappa squared slash 4 n Subscript e Superscript asterisk Baseline right-parenthesis slash n Subscript e Superscript asterisk Baseline EndRoot right-parenthesis slash left-parenthesis 1 plus kappa squared slash n Subscript e Superscript asterisk Baseline right-parenthesis

where n Subscript e Superscript asterisk is the adjusted effective sample size and ModifyingAbove p With caret is the design-based estimate of the proportion. By default, kappa equals z Subscript alpha slash 2. If you specify the CL(ADJUST=NO) option to use the unadjusted effective sample size n Subscript e instead of n Subscript e Superscript asterisk, then kappa equals t Subscript d f comma alpha slash 2. For more information, see Curtin et al. (2006).

Last updated: December 09, 2022