The SURVEYFREQ Procedure

Wald Log-Linear Chi-Square Test

If you specify the WLLCHISQ option in the TABLES statement, PROC SURVEYFREQ computes a Wald test for independence based on the log odds ratios. For more information about Wald tests, see the section Wald Chi-Square Test.

For a two-way table of R rows and C columns, the Wald log-linear test is based on the (R – 1)(C – 1)-dimensional array of elements ModifyingAbove upper Y With caret Subscript r c,

ModifyingAbove upper Y With caret Subscript r c Baseline equals log ModifyingAbove upper N With caret Subscript r c Baseline minus log ModifyingAbove upper N With caret Subscript r upper C Baseline minus log ModifyingAbove upper N With caret Subscript upper R c Baseline plus log ModifyingAbove upper N With caret Subscript upper R upper C

where ModifyingAbove upper N With caret Subscript r c is the estimated total for table cell (r, c). The null hypothesis of independence between the row and column variables can be expressed as upper H 0 colon upper Y Subscript r c Baseline equals 0 for all r equals 1 comma ellipsis left-parenthesis upper R minus 1 right-parenthesis and c equals 1 comma ellipsis left-parenthesis upper C minus 1 right-parenthesis. This null hypothesis can be stated equivalently in terms of cell proportions.

The generalized Wald log-linear chi-square statistic is computed as

upper Q Subscript upper L Baseline equals ModifyingAbove bold upper Y With caret Superscript prime Baseline ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold upper Y With caret right-parenthesis Superscript negative 1 Baseline ModifyingAbove bold upper Y With caret

where ModifyingAbove bold upper Y With caret is the (R – 1)(C – 1)-dimensional array of the ModifyingAbove upper Y With caret Subscript r c, and ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold upper Y With caret right-parenthesis estimates the variance of ModifyingAbove bold upper Y With caret,

ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold upper Y With caret right-parenthesis equals bold upper A bold upper D Superscript negative 1 Baseline ModifyingAbove upper V With caret left-parenthesis ModifyingAbove bold upper N With caret right-parenthesis bold upper D Superscript negative 1 Baseline bold upper A prime

where ModifyingAbove bold upper V With caret left-parenthesis ModifyingAbove bold upper N With caret right-parenthesis is the covariance matrix of the estimates ModifyingAbove upper N With caret Subscript r c, which is computed as described in the section Covariances of Frequency Estimates. bold upper D is a diagonal matrix with the estimated totals ModifyingAbove upper N With caret Subscript r c on the diagonal, and bold upper A is the left-parenthesis upper R minus 1 right-parenthesis left-parenthesis upper C minus 1 right-parenthesis by upper R upper C times upper R upper C linear contrast matrix.

Under the null hypothesis of independence, the statistic upper Q Subscript upper L approximately follows a chi-square distribution with (R – 1)(C – 1) degrees of freedom for large samples.

PROC SURVEYFREQ computes the Wald log-linear F statistic as

upper F Subscript upper L Baseline equals upper Q Subscript upper L Baseline slash left-parenthesis upper R minus 1 right-parenthesis left-parenthesis upper C minus 1 right-parenthesis

Under the null hypothesis of independence, upper F Subscript upper L approximately follows an F distribution with (R – 1)(C – 1) numerator degrees of freedom. PROC SURVEYFREQ computes the denominator degrees of freedom as described in the section Degrees of Freedom. Alternatively, you can use the DF= option in the TABLES statement to specify the denominator degrees of freedom.

For tables larger than 2 times 2, PROC SURVEYFREQ also computes the adjusted Wald log-linear F statistic as

upper F Subscript normal upper A normal d normal j normal bar normal upper L Baseline equals upper Q Subscript upper L Baseline left-parenthesis s minus k plus 1 right-parenthesis slash k s

where k = (R – 1)(C – 1), and s is the denominator degrees of freedom, which is computed as described in the section Degrees of Freedom. Alternatively, you can use the DF= option in the TABLES statement to specify the value of s. For 2 times 2 tables, k = (R – 1)(C – 1) = 1, and therefore the adjusted Wald F statistic equals the (unadjusted) Wald F statistic and has the same numerator and denominator degrees f freedom.

Under the null hypothesis, upper F Subscript normal upper A normal d normal j normal bar normal upper L approximately follows an F distribution with k numerator degrees of freedom and (sk + 1) denominator degrees of freedom.

Last updated: December 09, 2022