The SURVEYFREQ Procedure

Wald Chi-Square Test

PROC SURVEYFREQ provides two Wald chi-square tests for independence of the row and column variables in a two-way table: a Wald chi-square test based on the difference between observed and expected weighted cell frequencies, and a Wald log-linear chi-square test based on the log odds ratios. These statistics test for independence of the row and column variables in two-way tables, taking into account the complex survey design. For information about Wald statistics and their applications to categorical data analysis, see Bedrick (1983), Koch, Freeman, and Freeman (1975), and Wald (1943).

For these two tests, PROC SURVEYFREQ computes the generalized Wald chi-square statistic, the corresponding F statistic, and also an adjusted F statistic for tables larger than 2 times 2. Under the null hypothesis of independence, the Wald chi-square statistic approximately follows a chi-square distribution with (R – 1)(C – 1) degrees of freedom for large samples. However, it has been shown that this test can perform poorly in terms of actual significance level and power, especially for tables with a large number of cells or for samples with a relatively small number of clusters. For more information, see Thomas and Rao (1984), Thomas and Rao (1985), and Lohr (2010). For information about the adjusted F statistic, see Fellegi (1980) and Hidiroglou, Fuller, and Hickman (1980). Thomas and Rao (1984) found that the adjusted F statistic provides a more stable test than the chi-square statistic, although its power can be low when the number of sample clusters is not large. See also Korn and Graubard (1990) and Thomas, Singh, and Roberts (1996).

If you specify the WCHISQ option in the TABLES statement, PROC SURVEYFREQ computes a Wald test for independence in the two-way table based on the differences between the observed (weighted) cell frequencies and the expected frequencies.

Under the null hypothesis of independence of the row and column variables, the expected cell frequencies are computed as

upper E Subscript r c Baseline equals ModifyingAbove upper N With caret Subscript r dot Baseline ModifyingAbove upper N With caret Subscript dot c Baseline slash ModifyingAbove upper N With caret

where ModifyingAbove upper N With caret Subscript r dot is the estimated total for row r, ModifyingAbove upper N With caret Subscript dot c is the estimated total for column c, and ModifyingAbove upper N With caret is the estimated overall total, as described in the section Expected Weighted Frequency. The null hypothesis that the population weighted frequencies equal the expected frequencies can be expressed as

upper H 0 colon upper Y Subscript r c Baseline equals upper N Subscript r c Baseline minus upper E Subscript r c Baseline equals 0

for all r equals 1 comma ellipsis left-parenthesis upper R minus 1 right-parenthesis and c equals 1 comma ellipsis left-parenthesis upper C minus 1 right-parenthesis. This null hypothesis can be stated equivalently in terms of cell proportions, with the expected cell proportions computed as the products of the marginal row and column proportions.

The generalized Wald chi-square statistic upper Q Subscript upper W is computed as

upper Q Subscript upper W Baseline equals ModifyingAbove bold upper Y With caret prime left-parenthesis bold upper H ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold upper N With caret right-parenthesis bold upper H Superscript prime Baseline right-parenthesis Superscript negative 1 Baseline ModifyingAbove bold upper Y With caret

where ModifyingAbove bold upper Y With caret is an array of (R – 1)(C – 1) differences between the observed and expected weighted frequencies left-parenthesis ModifyingAbove upper N With caret Subscript r c Baseline minus upper E Subscript r c Baseline right-parenthesis, and left-parenthesis bold upper H ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold upper N With caret right-parenthesis bold upper H prime right-parenthesis estimates the variance of ModifyingAbove bold upper Y With caret.

ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold upper N With caret right-parenthesis is the covariance matrix of the estimates ModifyingAbove upper N With caret Subscript r c, and its computation is described in the section Covariances of Frequency Estimates.

bold upper H is an (R – 1)(C – 1) by RC matrix that contains the partial derivatives of the elements of ModifyingAbove bold upper Y With caret with respect to the elements of ModifyingAbove bold upper N With caret. The elements of bold upper H are computed as follows, where a denotes a row different from row r, and b denotes a column different from column c:

StartLayout 1st Row 1st Column partial-differential ModifyingAbove upper Y With caret Subscript r c slash partial-differential ModifyingAbove upper N With caret Subscript r c 2nd Column equals 3rd Column 1 minus left-parenthesis ModifyingAbove upper N With caret Subscript r dot Baseline plus ModifyingAbove upper N With caret Subscript dot c Baseline minus ModifyingAbove upper N With caret Subscript dot c Baseline ModifyingAbove upper N With caret Subscript r dot Baseline slash ModifyingAbove upper N With caret right-parenthesis slash ModifyingAbove upper N With caret 2nd Row 1st Column partial-differential ModifyingAbove upper Y With caret Subscript r c slash partial-differential ModifyingAbove upper N With caret Subscript a c 2nd Column equals 3rd Column minus left-parenthesis ModifyingAbove upper N With caret Subscript r dot Baseline minus ModifyingAbove upper N With caret Subscript r dot Baseline ModifyingAbove upper N With caret Subscript dot c Baseline slash ModifyingAbove upper N With caret right-parenthesis slash ModifyingAbove upper N With caret 3rd Row 1st Column partial-differential ModifyingAbove upper Y With caret Subscript r c slash partial-differential ModifyingAbove upper N With caret Subscript r b 2nd Column equals 3rd Column minus left-parenthesis ModifyingAbove upper N With caret Subscript dot c Baseline minus ModifyingAbove upper N With caret Subscript r dot Baseline ModifyingAbove upper N With caret Subscript dot c Baseline slash ModifyingAbove upper N With caret right-parenthesis slash ModifyingAbove upper N With caret 4th Row 1st Column partial-differential ModifyingAbove upper Y With caret Subscript r c slash partial-differential ModifyingAbove upper Y With caret Subscript a b 2nd Column equals 3rd Column ModifyingAbove upper N With caret Subscript r dot Baseline ModifyingAbove upper N With caret Subscript dot c Baseline slash ModifyingAbove upper N With caret Superscript 2 EndLayout

Under the null hypothesis of independence, the statistic upper Q Subscript upper W approximately follows a chi-square distribution with (R – 1)(C – 1) degrees of freedom for large samples.

PROC SURVEYFREQ computes the Wald F statistic as

upper F Subscript upper W Baseline equals upper Q Subscript upper W Baseline slash left-parenthesis upper R minus 1 right-parenthesis left-parenthesis upper C minus 1 right-parenthesis

Under the null hypothesis of independence, upper F Subscript upper W approximately follows an F distribution with (R – 1)(C – 1) numerator degrees of freedom. The denominator degrees of freedom is the degrees of freedom for the variance estimator and depends on the sample design and the variance estimation method. For more information, see the section Degrees of Freedom. Alternatively, you can use the DF= option in the TABLES statement to specify the denominator degrees of freedom.

For tables larger than 2 times 2, PROC SURVEYFREQ also computes the adjusted Wald F statistic as

upper F Subscript normal upper A normal d normal j normal bar normal upper W Baseline equals upper Q Subscript upper W Baseline left-parenthesis s minus k plus 1 right-parenthesis slash k s

where k = (R – 1)(C – 1), and s is the degrees of freedom. (For more information, see the section Degrees of Freedom.) Alternatively, you can use the DF= option in the TABLES statement to specify the value of s. For 2 times 2 tables, k = (R – 1 )(C – 1) = 1, and therefore the adjusted Wald F statistic equals the (unadjusted) Wald F statistic and has the same numerator and denominator degrees of freedom.

Under the null hypothesis, upper F Subscript normal upper A normal d normal j normal bar normal upper W approximately follows an F distribution with k numerator degrees of freedom and (sk + 1) denominator degrees of freedom.

Last updated: December 09, 2022