The FREQ Procedure

Cochran-Mantel-Haenszel Statistics

The CMH option in the TABLES statement gives a stratified statistical analysis of the relationship between the row and column variables after controlling for the strata variables in a multiway table. For example, for the table request A*B*C*D, the CMH option provides an analysis of the relationship between C and D, after controlling for A and B. The stratified analysis provides a way to adjust for the possible confounding effects of A and B without being forced to estimate parameters for them.

The CMH analysis produces Cochran-Mantel-Haenszel statistics, which include the correlation statistic, the ANOVA (row mean scores) statistic, and the general association statistic. For 2 times 2 tables, the CMH option also provides Mantel-Haenszel and logit estimates of the common odds ratio and the common relative risks, in addition to the Breslow-Day test for homogeneity of the odds ratios.

Exact statistics are also available for stratified 2 times 2 tables. If you specify the EQOR option in the EXACT statement, PROC FREQ provides Zelen’s exact test for equal odds ratios. If you specify the COMOR option in the EXACT statement, PROC FREQ provides exact confidence limits for the common odds ratio and an exact test that the common odds ratio equals one.

Let the number of strata be denoted by q, indexing the strata by h equals 1 comma 2 comma ellipsis comma q. Each stratum contains a contingency table with X representing the row variable and Y representing the column variable. For table h, denote the cell frequency in row i and column j by n Subscript h i j, with corresponding row and column marginal totals denoted by n Subscript h i period and n Subscript h period j, and the overall stratum total by n Subscript h.

Because the formulas for the Cochran-Mantel-Haenszel statistics are more easily defined in terms of matrices, the following notation is used. Vectors are presumed to be column vectors unless they are transposed left-parenthesis prime right-parenthesis.

StartLayout 1st Row 1st Column bold n prime Subscript h i 2nd Column equals 3rd Column left-parenthesis n Subscript h i Baseline 1 Baseline comma n Subscript h i Baseline 2 Baseline comma ellipsis comma n Subscript h i upper C Baseline right-parenthesis 4th Column Blank 5th Column left-parenthesis 1 times upper C right-parenthesis 2nd Row 1st Column bold n prime Subscript h 2nd Column equals 3rd Column left-parenthesis bold n prime Subscript h Baseline 1 Baseline comma bold n prime Subscript h Baseline 2 Baseline comma ellipsis comma bold n prime Subscript h upper R right-parenthesis 4th Column Blank 5th Column left-parenthesis 1 times upper R upper C right-parenthesis 3rd Row 1st Column p Subscript h i dot 2nd Column equals 3rd Column n Subscript h i dot Baseline slash n Subscript h Baseline 4th Column Blank 5th Column left-parenthesis 1 times 1 right-parenthesis 4th Row 1st Column p Subscript h dot j 2nd Column equals 3rd Column n Subscript h dot j Baseline slash n Subscript h Baseline 4th Column Blank 5th Column left-parenthesis 1 times 1 right-parenthesis 5th Row 1st Column bold upper P prime Subscript h asterisk dot 2nd Column equals 3rd Column left-parenthesis p Subscript h 1 dot Baseline comma p Subscript h 2 dot Baseline comma ellipsis comma p Subscript h upper R dot Baseline right-parenthesis 4th Column Blank 5th Column left-parenthesis 1 times upper R right-parenthesis 6th Row 1st Column bold upper P prime Subscript h dot asterisk 2nd Column equals 3rd Column left-parenthesis p Subscript h dot 1 Baseline comma p Subscript h dot 2 Baseline comma ellipsis comma p Subscript h dot upper C Baseline right-parenthesis 4th Column Blank 5th Column left-parenthesis 1 times upper C right-parenthesis EndLayout

Assume that the strata are independent and that the marginal totals of each stratum are fixed. The null hypothesis, upper H 0, is that there is no association between X and Y in any of the strata. The corresponding model is the multiple hypergeometric; this implies that, under upper H 0, the expected value and covariance matrix of the frequencies are, respectively,

bold m Subscript h Baseline equals bold upper E left-bracket bold n Subscript h Baseline vertical-bar upper H 0 right-bracket equals n Subscript h Baseline left-parenthesis bold upper P Subscript h dot asterisk Baseline circled-times bold upper P Subscript h asterisk dot Baseline right-parenthesis
bold upper V bold a bold r left-bracket bold n Subscript h Baseline vertical-bar upper H 0 right-bracket equals c left-parenthesis left-parenthesis bold upper D Subscript bold upper P h dot asterisk Baseline minus bold upper P Subscript h dot asterisk Baseline bold upper P prime Subscript h dot asterisk right-parenthesis circled-times left-parenthesis bold upper D Subscript bold upper P h asterisk dot Baseline minus bold upper P Subscript h asterisk dot Baseline bold upper P prime Subscript h asterisk dot right-parenthesis right-parenthesis

where

c equals n Subscript h Superscript 2 Baseline slash left-parenthesis n Subscript h Baseline minus 1 right-parenthesis

and where circled-times denotes Kronecker product multiplication and bold upper D Subscript bold a is a diagonal matrix with the elements of bold a on the main diagonal.

The generalized CMH statistic (Landis, Heyman, and Koch 1978) is defined as

upper Q Subscript normal upper C normal upper M normal upper H Baseline equals bold upper G prime bold upper V Subscript bold upper G Baseline Superscript negative 1 Baseline bold upper G

where

StartLayout 1st Row 1st Column bold upper G 2nd Column equals 3rd Column sigma-summation Underscript h Endscripts bold upper B Subscript h Baseline left-parenthesis bold n Subscript h Baseline minus bold m Subscript h Baseline right-parenthesis 2nd Row 1st Column bold upper V Subscript bold upper G 2nd Column equals 3rd Column sigma-summation Underscript h Endscripts bold upper B Subscript h Baseline left-parenthesis bold upper V bold a bold r left-bracket bold n Subscript h Baseline vertical-bar upper H 0 right-bracket right-parenthesis bold upper B prime Subscript h EndLayout

and where

bold upper B Subscript h Baseline equals bold upper C Subscript h Baseline circled-times bold upper R Subscript h

is a matrix of fixed constants based on column scores bold upper C Subscript h and row scores bold upper R Subscript h. When the null hypothesis is true, the CMH statistic has an asymptotic chi-square distribution with degrees of freedom equal to the rank of bold upper B Subscript h. If bold upper V Subscript bold upper G is found to be singular, PROC FREQ prints a message and sets the value of the CMH statistic to missing.

PROC FREQ computes three CMH statistics by using this formula for the generalized CMH statistic, with different row and column score definitions for each statistic. The CMH statistics that PROC FREQ computes are the correlation statistic, the ANOVA (row mean scores) statistic, and the general association statistic. These statistics test the null hypothesis of no association against different alternative hypotheses. The following sections describe the computation of these CMH statistics.

Caution: The CMH statistics have low power for detecting an association in which the patterns of association for some of the strata are in the opposite direction of the patterns displayed by other strata. Thus, a nonsignificant CMH statistic suggests either that there is no association or that no pattern of association has enough strength or consistency to dominate any other pattern.

Correlation Statistic

The correlation statistic, popularized by Mantel and Haenszel, has 1 degree of freedom and is known as the Mantel-Haenszel statistic (Mantel and Haenszel 1959; Mantel 1963).

The alternative hypothesis for the correlation statistic is that there is a linear association between X and Y in at least one stratum. If either X or Y does not lie on an ordinal (or interval) scale, this statistic is not meaningful.

To compute the correlation statistic, PROC FREQ uses the formula for the generalized CMH statistic with the row and column scores determined by the SCORES= option in the TABLES statement. See the section Scores for more information about the available score types. The matrix of row scores bold upper R Subscript h has dimension 1 times upper R, and the matrix of column scores bold upper C Subscript h has dimension 1 times upper C.

When there is only one stratum, this CMH statistic reduces to left-parenthesis n minus 1 right-parenthesis r squared, where r is the Pearson correlation coefficient between X and Y. When nonparametric (RANK or RIDIT) scores are specified, the statistic reduces to left-parenthesis n minus 1 right-parenthesis r Subscript s Superscript 2, where r Subscript s is the Spearman rank correlation coefficient between X and Y. When there is more than one stratum, this CMH statistic becomes a stratum-adjusted correlation statistic.

ANOVA (Row Mean Scores) Statistic

The ANOVA statistic can be used only when the column variable Y lies on an ordinal (or interval) scale so that the mean score of Y is meaningful. For the ANOVA statistic, the mean score is computed for each row of the table, and the alternative hypothesis is that, for at least one stratum, the mean scores of the R rows are unequal. In other words, the statistic is sensitive to location differences among the R distributions of Y.

The matrix of column scores bold upper C Subscript h has dimension 1 times upper C, and the column scores are determined by the SCORES= option.

The matrix of row scores bold upper R Subscript h has dimension left-parenthesis upper R minus 1 right-parenthesis times upper R and is created internally by PROC FREQ as

bold upper R Subscript h Baseline equals left-bracket bold upper I Subscript upper R minus 1 Baseline comma minus bold upper J Subscript upper R minus 1 Baseline right-bracket

where bold upper I Subscript upper R minus 1 is an identity matrix of rank R – 1 and bold upper J Subscript upper R minus 1 is an left-parenthesis upper R minus 1 right-parenthesis times 1 vector of ones. This matrix has the effect of forming R – 1 independent contrasts of the R mean scores.

When there is only one stratum, this CMH statistic is essentially an analysis of variance (ANOVA) statistic in the sense that it is a function of the variance ratio F statistic that would be obtained from a one-way ANOVA on the dependent variable Y. If nonparametric scores are specified in this case, the ANOVA statistic is a Kruskal-Wallis test.

When there is more than one stratum, this CMH statistic corresponds to a stratum-adjusted ANOVA or Kruskal-Wallis test. In the special case where there is one subject per row and one subject per column in the contingency table of each stratum, this CMH statistic is identical to Friedman’s chi-square. See Example 47.9 for an illustration.

General Association Statistic

The alternative hypothesis for the general association statistic is that, for at least one stratum, there is some kind of association between X and Y. This statistic is always interpretable because it does not require an ordinal scale for either X or Y.

For the general association statistic, the matrix bold upper R Subscript h is the same as the one used for the ANOVA statistic. The matrix bold upper C Subscript h is defined similarly as

bold upper C Subscript h Baseline equals left-bracket bold upper I Subscript upper C minus 1 Baseline comma minus bold upper J Subscript upper C minus 1 Baseline right-bracket

PROC FREQ generates both score matrices internally. When there is only one stratum, the general association CMH statistic reduces to upper Q Subscript upper P Baseline left-parenthesis n minus 1 right-parenthesis slash n, where upper Q Subscript upper P is the Pearson chi-square statistic. When there is more than one stratum, the CMH statistic becomes a stratum-adjusted Pearson chi-square statistic. Note that a similar adjustment can be made by summing the Pearson chi-squares across the strata. However, the latter statistic requires a large sample size in each stratum to support the resulting chi-square distribution with q(R–1)(C–1) degrees of freedom. The CMH statistic requires only a large overall sample size because it has only (R–1)(C–1) degrees of freedom.

See Cochran (1954); Mantel and Haenszel (1959); Mantel (1963); Birch (1965); Landis, Heyman, and Koch (1978).

Mantel-Fleiss Criterion

If you specify the CMH(MANTELFLEISS) option in the TABLES statement, PROC FREQ computes the Mantel-Fleiss criterion for stratified 2 times 2 tables. The Mantel-Fleiss criterion can be used to assess the validity of the chi-square approximation for the distribution of the Mantel-Haenszel statistic for 2 times 2 tables. For more information, see Mantel and Fleiss (1980); Mantel and Haenszel (1959); Stokes, Davis, and Koch (2012); Dmitrienko et al. (2005).

The Mantel-Fleiss criterion is computed as

normal upper M normal upper F equals min left-parenthesis left-bracket sigma-summation Underscript h Endscripts m Subscript h Baseline 11 Baseline minus sigma-summation Underscript h Endscripts left-parenthesis n Subscript h Baseline 11 Baseline right-parenthesis Subscript upper L Baseline right-bracket comma left-bracket sigma-summation Underscript h Endscripts left-parenthesis n Subscript h Baseline 11 Baseline right-parenthesis Subscript upper U Baseline minus sigma-summation Underscript h Endscripts m Subscript h Baseline 11 Baseline right-bracket right-parenthesis

where m Subscript h Baseline 11 is the expected value of n Subscript h Baseline 11 under the hypothesis of no association between the row and column variables in table h, left-parenthesis n Subscript h Baseline 11 Baseline right-parenthesis Subscript upper L is the minimum possible value of the table cell frequency, and left-parenthesis n Subscript h Baseline 11 Baseline right-parenthesis Subscript upper U is the maximum possible value,

StartLayout 1st Row 1st Column m Subscript h Baseline 11 2nd Column equals 3rd Column n Subscript h 1 dot Baseline n Subscript h dot 1 Baseline slash n Subscript h Baseline 2nd Row 1st Column left-parenthesis n Subscript h Baseline 11 Baseline right-parenthesis Subscript upper L 2nd Column equals 3rd Column max left-parenthesis 0 comma n Subscript h 1 dot Baseline minus n Subscript h dot 2 Baseline right-parenthesis 3rd Row 1st Column left-parenthesis n Subscript h Baseline 11 Baseline right-parenthesis Subscript upper U 2nd Column equals 3rd Column min left-parenthesis n Subscript h dot 1 Baseline comma n Subscript h 1 dot Baseline right-parenthesis EndLayout

The Mantel-Fleiss guideline accepts the validity of the Mantel-Haenszel approximation when the value of the criterion is at least 5. When the criterion is less than 5, PROC FREQ displays a warning.

Adjusted Odds Ratio and Relative Risk Estimates

The CMH option provides adjusted odds ratio and relative risk estimates for stratified 2 times 2 tables. For each of these measures, PROC FREQ computes a Mantel-Haenszel estimate and a logit estimate. These estimates apply to n-way table requests in the TABLES statement, when the row and column variables both have two levels.

For example, for the table request A*B*C*D, if the row and column variables C and D both have two levels, PROC FREQ provides odds ratio and relative risk estimates, adjusting for the confounding variables A and B.

The choice of an appropriate measure depends on the study design. For case-control (retrospective) studies, the odds ratio is appropriate. For cohort (prospective) or cross-sectional studies, the relative risk is appropriate. See the section Odds Ratio and Relative Risks for more information on these measures.

Throughout this section, z denotes the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the standard normal distribution.

Odds Ratio, Case-Control Studies

PROC FREQ provides Mantel-Haenszel and logit estimates for the common odds ratio for stratified 2 times 2 tables.

Mantel-Haenszel Estimator
The Mantel-Haenszel estimate of the common odds ratio is computed as

normal upper O normal upper R Subscript normal upper M normal upper H Baseline equals left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 11 Baseline n Subscript h Baseline 22 Baseline slash n Subscript h Baseline right-parenthesis slash left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 12 Baseline n Subscript h Baseline 21 Baseline slash n Subscript h Baseline right-parenthesis

It is always computed unless the denominator is 0. For more information, see Mantel and Haenszel (1959) and Agresti (2002).

To compute confidence limits for the common odds ratio, PROC FREQ uses the Robins, Breslow, and Greenland (1986) variance estimate for log left-parenthesis normal upper O normal upper R Subscript normal upper M normal upper H Baseline right-parenthesis. The 100 left-parenthesis 1 minus alpha slash 2 right-parenthesis% confidence limits for the common odds ratio are

left-parenthesis normal upper O normal upper R Subscript normal upper M normal upper H Baseline times exp left-parenthesis minus z ModifyingAbove sigma With caret right-parenthesis comma normal upper O normal upper R Subscript normal upper M normal upper H Baseline times exp left-parenthesis z ModifyingAbove sigma With caret right-parenthesis right-parenthesis

where

StartLayout 1st Row 1st Column ModifyingAbove sigma With caret squared 2nd Column equals 3rd Column ModifyingAbove normal upper V normal a normal r With caret left-parenthesis log left-parenthesis normal upper O normal upper R Subscript normal upper M normal upper H Baseline right-parenthesis right-parenthesis 2nd Row 1st Column Blank 2nd Column equals 3rd Column StartFraction sigma-summation Underscript h Endscripts left-parenthesis n Subscript h Baseline 11 Baseline plus n Subscript h Baseline 22 Baseline right-parenthesis left-parenthesis n Subscript h Baseline 11 Baseline n Subscript h Baseline 22 Baseline right-parenthesis slash n Subscript h Superscript 2 Baseline Over 2 left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 11 Baseline n Subscript h Baseline 22 Baseline slash n Subscript h Baseline right-parenthesis squared EndFraction 3rd Row 1st Column Blank 2nd Column Blank 3rd Column plus StartFraction sigma-summation Underscript h Endscripts left-bracket left-parenthesis n Subscript h Baseline 11 Baseline plus n Subscript h Baseline 22 Baseline right-parenthesis left-parenthesis n Subscript h Baseline 12 Baseline n Subscript h Baseline 21 Baseline right-parenthesis plus left-parenthesis n Subscript h Baseline 12 Baseline plus n Subscript h Baseline 21 Baseline right-parenthesis left-parenthesis n Subscript h Baseline 11 Baseline n Subscript h Baseline 22 Baseline right-parenthesis right-bracket slash n Subscript h Superscript 2 Baseline Over 2 left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 11 Baseline n Subscript h Baseline 22 Baseline slash n Subscript h Baseline right-parenthesis left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 12 Baseline n Subscript h Baseline 21 Baseline slash n Subscript h Baseline right-parenthesis EndFraction 4th Row 1st Column Blank 2nd Column Blank 3rd Column plus StartFraction sigma-summation Underscript h Endscripts left-parenthesis n Subscript h Baseline 12 Baseline plus n Subscript h Baseline 21 Baseline right-parenthesis left-parenthesis n Subscript h Baseline 12 Baseline n Subscript h Baseline 21 Baseline right-parenthesis slash n Subscript h Superscript 2 Baseline Over 2 left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 12 Baseline n Subscript h Baseline 21 Baseline slash n Subscript h Baseline right-parenthesis squared EndFraction EndLayout

Note that the Mantel-Haenszel odds ratio estimator is less sensitive to small n Subscript h than the logit estimator.

Logit Estimator
The adjusted logit estimate of the common odds ratio (Woolf 1955) is computed as

normal upper O normal upper R Subscript normal upper L Baseline equals exp left-parenthesis sigma-summation Underscript h Endscripts w Subscript h Baseline log left-parenthesis normal upper O normal upper R Subscript h Baseline right-parenthesis slash sigma-summation Underscript h Endscripts w Subscript h Baseline right-parenthesis

and the corresponding 100 left-parenthesis 1 minus alpha right-parenthesis% confidence limits are

left-parenthesis normal upper O normal upper R Subscript normal upper L Baseline times exp left-parenthesis negative z slash StartRoot sigma-summation Underscript h Endscripts w Subscript h Baseline EndRoot right-parenthesis comma normal upper O normal upper R Subscript normal upper L Baseline times exp left-parenthesis z slash StartRoot sigma-summation Underscript h Endscripts w Subscript h Baseline EndRoot right-parenthesis right-parenthesis

where normal upper O normal upper R Subscript h is the odds ratio for stratum h, and

w Subscript h Baseline equals 1 slash normal upper V normal a normal r left-parenthesis log left-parenthesis normal upper O normal upper R Subscript h Baseline right-parenthesis right-parenthesis

If any table cell frequency in a stratum h is 0, PROC FREQ adds 0.5 to each cell frequency in the stratum before computing normal upper O normal upper R Subscript h and w Subscript h (Haldane 1956) for the logit estimate. The procedure provides a warning when this occurs.

Relative Risks, Cohort Studies

PROC FREQ provides Mantel-Haenszel and logit estimates of the common relative risks for stratified 2 times 2 tables.

Mantel-Haenszel Estimator
The Mantel-Haenszel estimate of the common relative risk for column 1 is computed as

normal upper R normal upper R Subscript normal upper M normal upper H Baseline equals left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 11 Baseline n Subscript h 2 dot Baseline slash n Subscript h Baseline right-parenthesis slash left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 21 Baseline n Subscript h 1 dot Baseline slash n Subscript h Baseline right-parenthesis

It is always computed unless the denominator is 0. See Mantel and Haenszel (1959) and Agresti (2002) for more information.

To compute confidence limits for the common relative risk, PROC FREQ uses the Greenland and Robins (1985) variance estimate for log left-parenthesis normal upper R normal upper R Subscript normal upper M normal upper H Baseline right-parenthesis. The 100 left-parenthesis 1 minus alpha slash 2 right-parenthesis% confidence limits for the common relative risk are

left-parenthesis normal upper R normal upper R Subscript normal upper M normal upper H Baseline times exp left-parenthesis minus z ModifyingAbove sigma With caret right-parenthesis comma normal upper R normal upper R Subscript normal upper M normal upper H Baseline times exp left-parenthesis z ModifyingAbove sigma With caret right-parenthesis right-parenthesis

where

ModifyingAbove sigma With caret squared equals ModifyingAbove normal upper V normal a normal r With caret left-parenthesis log left-parenthesis normal upper R normal upper R Subscript normal upper M normal upper H Baseline right-parenthesis right-parenthesis equals StartFraction sigma-summation Underscript h Endscripts left-parenthesis n Subscript h 1 dot Baseline n Subscript h 2 dot Baseline n Subscript h dot 1 Baseline minus n Subscript h Baseline 11 Baseline n Subscript h Baseline 21 Baseline n Subscript h Baseline right-parenthesis slash n Subscript h Superscript 2 Baseline Over left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 11 Baseline n Subscript h 2 dot Baseline slash n Subscript h Baseline right-parenthesis left-parenthesis sigma-summation Underscript h Endscripts n Subscript h Baseline 21 Baseline n Subscript h 1 dot Baseline slash n Subscript h Baseline right-parenthesis EndFraction

Logit Estimator
The adjusted logit estimate of the common relative risk for column 1 is computed as

normal upper R normal upper R Subscript normal upper L Baseline equals exp left-parenthesis sigma-summation Underscript h Endscripts w Subscript h Baseline log left-parenthesis normal upper R normal upper R Subscript h Baseline right-parenthesis slash sigma-summation w Subscript h Baseline right-parenthesis

and the corresponding 100 left-parenthesis 1 minus alpha right-parenthesis% confidence limits are

left-parenthesis normal upper R normal upper R Subscript normal upper L Baseline times exp left-parenthesis negative z slash StartRoot sigma-summation Underscript h Endscripts w Subscript h Baseline EndRoot right-parenthesis comma normal upper R normal upper R Subscript normal upper L Baseline times exp left-parenthesis z slash StartRoot sigma-summation Underscript h Endscripts w Subscript h Baseline EndRoot right-parenthesis right-parenthesis

where normal upper R normal upper R Subscript h is the column 1 relative risk estimate for stratum h and

w Subscript h Baseline equals 1 slash Var left-parenthesis log left-parenthesis normal upper R normal upper R Subscript h Baseline right-parenthesis right-parenthesis

If n Subscript h Baseline 11 or n Subscript h Baseline 21 is 0, PROC FREQ adds 0.5 to each cell frequency in the stratum before computing upper R upper R Subscript h and w Subscript h for the logit estimate. The procedure prints a warning when this occurs. For more information, see Kleinbaum, Kupper, and Morgenstern (1982, Sections 17.4 and 17.5).

Breslow-Day Test for Homogeneity of the Odds Ratios

When you specify the CMH option, PROC FREQ computes the Breslow-Day test for stratified 2 times 2 tables. It tests the null hypothesis that the odds ratios for the q strata are equal. When the null hypothesis is true, the statistic has approximately a chi-square distribution with q–1 degrees of freedom. See Breslow and Day (1980) and Agresti (2007) for more information.

The Breslow-Day statistic is computed as

upper Q Subscript normal upper B normal upper D Baseline equals sigma-summation Underscript h Endscripts left-parenthesis n Subscript h Baseline 11 Baseline minus normal upper E left-parenthesis n Subscript h Baseline 11 Baseline vertical-bar normal upper O normal upper R Subscript normal upper M normal upper H Baseline right-parenthesis right-parenthesis squared slash normal upper V normal a normal r left-parenthesis n Subscript h Baseline 11 Baseline vertical-bar normal upper O normal upper R Subscript normal upper M normal upper H Baseline right-parenthesis

where E and Var denote expected value and variance, respectively. The summation does not include any table that contains a row or column that has a total frequency of 0. If normal upper O normal upper R Subscript normal upper M normal upper H is 0 or undefined, PROC FREQ does not compute the statistic and prints a warning message.

For the Breslow-Day test to be valid, the sample size should be relatively large in each stratum, and at least 80% of the expected cell counts should be greater than 5. Note that this is a stricter sample size requirement than the requirement for the Cochran-Mantel-Haenszel test for q times 2 times 2 tables, in that each stratum sample size (not just the overall sample size) must be relatively large. Even when the Breslow-Day test is valid, it might not be very powerful against certain alternatives, as discussed in Breslow and Day (1980).

If you specify the BDT option, PROC FREQ computes the Breslow-Day test with Tarone’s adjustment, which subtracts an adjustment factor from upper Q Subscript normal upper B normal upper D to make the resulting statistic asymptotically chi-square. The Breslow-Day-Tarone statistic is computed as

upper Q Subscript normal upper B normal upper D normal upper T Baseline equals upper Q Subscript normal upper B normal upper D Baseline minus left-parenthesis sigma-summation Underscript h Endscripts left-parenthesis n Subscript h Baseline 11 Baseline minus normal upper E left-parenthesis n Subscript h Baseline 11 Baseline vertical-bar normal upper O normal upper R Subscript normal upper M normal upper H Baseline right-parenthesis right-parenthesis right-parenthesis squared slash sigma-summation Underscript h Endscripts normal upper V normal a normal r left-parenthesis n Subscript h Baseline 11 Baseline vertical-bar normal upper O normal upper R Subscript normal upper M normal upper H Baseline right-parenthesis

See Tarone (1985); Jones et al. (1989); Breslow (1996) for more information.

Q Test for Homogeneity of Odds Ratios

PROC FREQ computes a Q test for homogeneity of odds ratios as

upper Q equals sigma-summation Underscript h Endscripts w Subscript h Baseline left-parenthesis theta Subscript h Baseline minus theta overbar right-parenthesis squared

where theta Subscript h is the log odds ratio in stratum h and theta overbar is the logit estimate of the common log odds ratio. The stratum weights w Subscript h are

w Subscript h Baseline equals 1 slash normal upper V normal a normal r left-parenthesis theta Subscript h Baseline right-parenthesis

where

normal upper V normal a normal r left-parenthesis theta Subscript h Baseline right-parenthesis equals 1 slash n Subscript h Baseline 11 Baseline plus 1 slash n Subscript h Baseline 12 Baseline plus 1 slash n Subscript h Baseline 21 Baseline plus 1 slash n Subscript h Baseline 22

If any table cell frequency in a stratum is 0, PROC FREQ adds 0.5 to each cell frequency in the stratum before computing theta Subscript h and w Subscript h. For more information, see the sections Odds Ratio and Adjusted Odds Ratio and Relative Risk Estimates.

Under the null hypothesis of homogeneity, the Q statistic has approximately a chi-square distribution with k–1 degrees of freedom, where k is the number of strata.

I-Square Measure of Heterogeneity

The I-square statistic (Higgins and Thompson 2002) is a measure of heterogeneity among strata for stratified 2 times 2 tables. I-square is expressed in percentage form and can be interpreted as the proportion of total variability that is due to between-strata variability. For more information, see Higgins et al. (2003) and Thorlund et al. (2012).

PROC FREQ computes I-square for the Q test for odds ratios as

upper I squared equals max left-parenthesis 100 percent-sign times left-parenthesis upper Q minus left-parenthesis k minus 1 right-parenthesis right-parenthesis slash upper Q comma 0 right-parenthesis

where k is the number of strata and Q is described in the section Q Test for Homogeneity of Odds Ratios.

PROC FREQ computes uncertainty limits for I-square by using the test-based method of Higgins and Thompson (2002). This method constructs confidence limits for H, where upper H squared equals upper Q slash left-parenthesis k minus 1 right-parenthesis. When upper Q greater-than k or k equals 2, the standard error of log left-parenthesis upper H right-parenthesis is computed as

normal upper S normal upper E Subscript 1 Baseline left-parenthesis log left-parenthesis upper H right-parenthesis right-parenthesis equals left-parenthesis log left-parenthesis upper Q right-parenthesis minus log left-parenthesis k minus 1 right-parenthesis right-parenthesis slash 2 left-parenthesis StartRoot 2 upper Q EndRoot minus StartRoot 2 k minus 3 EndRoot right-parenthesis

When upper Q less-than-or-equal-to k and k greater-than 2, the standard error of log left-parenthesis upper H right-parenthesis is computed as

normal upper S normal upper E Subscript 0 Baseline left-parenthesis log left-parenthesis upper H right-parenthesis right-parenthesis equals StartRoot left-parenthesis 1 minus left-parenthesis 1 slash 3 left-parenthesis k minus 2 right-parenthesis squared right-parenthesis right-parenthesis slash 2 left-parenthesis k minus 2 right-parenthesis EndRoot

The 100 left-parenthesis 1 minus alpha right-parenthesis% confidence limits for H are

left-parenthesis upper H times exp left-parenthesis minus z Subscript alpha slash 2 Baseline times normal upper S normal upper E left-parenthesis log left-parenthesis upper H right-parenthesis right-parenthesis comma upper H times exp left-parenthesis z Subscript alpha slash 2 Baseline times normal upper S normal upper E left-parenthesis log left-parenthesis upper H right-parenthesis right-parenthesis right-parenthesis

The uncertainty limits for upper I squared are computed by transforming the confidence limits for H, where upper I squared equals 1 minus left-parenthesis 1 slash upper H squared right-parenthesis.

When upper I squared is 0, PROC FREQ sets the lower confidence limit to 0 and determines the upper limit by using the level alpha (instead of alpha slash 2).

Zelen’s Exact Test for Equal Odds Ratios

If you specify the EQOR option in the EXACT statement, PROC FREQ computes Zelen’s exact test for equal odds ratios for stratified 2 times 2 tables. Zelen’s test is an exact counterpart to the Breslow-Day asymptotic test for equal odds ratios. The reference set for Zelen’s test includes all possible q times 2 times 2 tables with the same row, column, and stratum totals as the observed multiway table and with the same sum of cell (1,1) frequencies as the observed table. The test statistic is the probability of the observed q times 2 times 2 table conditional on the fixed margins, which is a product of hypergeometric probabilities.

The p-value for Zelen’s test is the sum of all table probabilities that are less than or equal to the observed table probability, where the sum is computed over all tables in the reference set determined by the fixed margins and the observed sum of cell (1,1) frequencies. This test is similar to Fisher’s exact test for two-way tables. For more information, see Zelen (1971); Hirji (2006); Agresti (1992). PROC FREQ computes Zelen’s exact test by using the polynomial multiplication algorithm of Hirji et al. (1996).

Exact Confidence Limits for the Common Odds Ratio

If you specify the COMOR option in the EXACT statement, PROC FREQ computes exact confidence limits for the common odds ratio for stratified 2 times 2 tables. This computation assumes that the odds ratio is constant over all the 2 times 2 tables. Exact confidence limits are constructed from the distribution of upper S equals sigma-summation Underscript h Endscripts n Subscript h Baseline 11, conditional on the marginal totals of the 2 times 2 tables.

Because this is a discrete problem, the confidence coefficient for these exact confidence limits is not exactly left-parenthesis 1 minus alpha right-parenthesis but is at least left-parenthesis 1 minus alpha right-parenthesis. Thus, these confidence limits are conservative. See Agresti (1992) for more information.

PROC FREQ computes exact confidence limits for the common odds ratio by using an algorithm based on Vollset, Hirji, and Elashoff (1991). See also Mehta, Patel, and Gray (1985).

Conditional on the marginal totals of 2 times 2 table h, let the random variable upper S Subscript h denote the frequency of table cell (1,1). Given the row totals n Subscript h 1 dot and n Subscript h 2 dot and column totals n Subscript h dot 1 and n Subscript h dot 2, the lower and upper bounds for upper S Subscript h are l Subscript h and u Subscript h,

StartLayout 1st Row 1st Column l Subscript h 2nd Column equals 3rd Column max left-parenthesis 0 comma n Subscript h 1 dot Baseline minus n Subscript h dot 2 Baseline right-parenthesis 2nd Row 1st Column u Subscript h 2nd Column equals 3rd Column min left-parenthesis n Subscript h 1 dot Baseline comma n Subscript h dot 1 Baseline right-parenthesis EndLayout

Let upper C Subscript s Sub Subscript h denote the hypergeometric coefficient,

upper C Subscript s Sub Subscript h Baseline equals StartBinomialOrMatrix n Subscript h dot 1 Baseline Choose s Subscript h Baseline EndBinomialOrMatrix StartBinomialOrMatrix n Subscript h dot 2 Baseline Choose n Subscript h 1 dot Baseline minus s Subscript h EndBinomialOrMatrix

and let phi denote the common odds ratio. Then the conditional distribution of upper S Subscript h is

upper P left-parenthesis upper S Subscript h Baseline equals s Subscript h Baseline vertical-bar n Subscript 1 dot Baseline comma n Subscript dot 1 Baseline comma n Subscript dot 2 Baseline right-parenthesis equals upper C Subscript s Sub Subscript h Subscript Baseline phi Superscript s Super Subscript h Superscript Baseline slash sigma-summation Underscript x equals l Subscript h Baseline Overscript x equals u Subscript h Baseline Endscripts upper C Subscript x Baseline phi Superscript x

Summing over all the 2 times 2 tables, upper S equals sigma-summation Underscript h Endscripts upper S Subscript h, and the lower and upper bounds of S are l and u,

l equals sigma-summation Underscript h Endscripts l Subscript h Baseline normal a normal n normal d u equals sigma-summation Underscript h Endscripts u Subscript h

The conditional distribution of the sum S is

upper P left-parenthesis upper S equals s vertical-bar n Subscript h 1 dot Baseline comma n Subscript h dot 1 Baseline comma n Subscript h dot 2 Baseline semicolon h equals 1 comma ellipsis comma q right-parenthesis equals upper C Subscript s Baseline phi Superscript s Baseline slash sigma-summation Underscript x equals l Overscript x equals u Endscripts upper C Subscript x Baseline phi Superscript x

where

upper C Subscript s Baseline equals sigma-summation Underscript s 1 plus midline-horizontal-ellipsis plus s Subscript q Baseline equals s Endscripts left-parenthesis product Underscript h Endscripts upper C Subscript s Sub Subscript h Subscript Baseline right-parenthesis

Let s 0 denote the observed sum of cell (1,1) frequencies over the q tables. The following two equations are solved iteratively for lower and upper confidence limits for the common odds ratio, phi 1 and phi 2:

StartLayout 1st Row 1st Column sigma-summation Underscript x equals s 0 Overscript x equals u Endscripts upper C Subscript x Baseline phi 1 Superscript x Baseline slash sigma-summation Underscript x equals l Overscript x equals u Endscripts upper C Subscript x Baseline phi 1 Superscript x Baseline 2nd Column equals 3rd Column alpha slash 2 2nd Row 1st Column sigma-summation Underscript x equals l Overscript x equals s 0 Endscripts upper C Subscript x Baseline phi 2 Superscript x Baseline slash sigma-summation Underscript x equals l Overscript x equals u Endscripts upper C Subscript x Baseline phi 2 Superscript x Baseline 2nd Column equals 3rd Column alpha slash 2 EndLayout

When the observed sum s 0 equals the lower bound l, PROC FREQ sets the lower confidence limit to 0 and determines the upper limit with level alpha. Similarly, when the observed sum s 0 equals the upper bound u, PROC FREQ sets the upper confidence limit to infinity and determines the lower limit with level alpha.

When you specify the COMOR option in the EXACT statement, PROC FREQ also computes the exact test that the common odds ratio equals one. Setting phi equals 1, the conditional distribution of the sum S under the null hypothesis becomes

upper P 0 left-parenthesis upper S equals s vertical-bar n Subscript h 1 dot Baseline comma n Subscript h dot 1 Baseline comma n Subscript h dot 2 Baseline semicolon h equals 1 comma ellipsis comma q right-parenthesis equals upper C Subscript s Baseline slash sigma-summation Underscript x equals l Overscript x equals u Endscripts upper C Subscript x Baseline

The point probability for this exact test is the probability of the observed sum s 0 under the null hypothesis, conditional on the marginals of the stratified 2 times 2 tables, and is denoted by upper P 0 left-parenthesis s 0 right-parenthesis. The expected value of S under the null hypothesis is

normal upper E 0 left-parenthesis upper S right-parenthesis equals sigma-summation Underscript x equals l Overscript x equals u Endscripts x upper C Subscript x Baseline slash sigma-summation Underscript x equals l Overscript x equals u Endscripts upper C Subscript x Baseline

The one-sided exact p-value is computed from the conditional distribution as upper P 0 left-parenthesis upper S greater-than equals s 0 right-parenthesis or upper P 0 left-parenthesis upper S less-than-or-equal-to s 0 right-parenthesis, depending on whether the observed sum s 0 is greater or less than normal upper E 0 left-parenthesis upper S right-parenthesis,

StartLayout 1st Row 1st Column upper P 1 2nd Column equals 3rd Column upper P 0 left-parenthesis upper S greater-than equals s 0 right-parenthesis equals sigma-summation Underscript x equals s 0 Overscript x equals u Endscripts upper C Subscript x Baseline slash sigma-summation Underscript x equals l Overscript x equals u Endscripts upper C Subscript x Baseline normal i normal f s 0 greater-than normal upper E 0 left-parenthesis upper S right-parenthesis 2nd Row 1st Column upper P 1 2nd Column equals 3rd Column upper P 0 left-parenthesis upper S less-than equals s 0 right-parenthesis equals sigma-summation Underscript x equals l Overscript x equals s 0 Endscripts upper C Subscript x Baseline slash sigma-summation Underscript x equals l Overscript x equals u Endscripts upper C Subscript x Baseline normal i normal f s 0 less-than-or-equal-to normal upper E 0 left-parenthesis upper S right-parenthesis EndLayout

PROC FREQ computes two-sided p-values for this test according to three different definitions. A two-sided p-value is computed as twice the one-sided p-value, setting the result equal to one if it exceeds one,

upper P 2 Superscript a Baseline equals 2 times upper P 1

In addition, a two-sided p-value is computed as the sum of all probabilities less than or equal to the point probability of the observed sum s 0, summing over all possible values of s,   l less-than-or-equal-to s less-than-or-equal-to u,

upper P 2 Superscript b Baseline equals sigma-summation Underscript l less-than-or-equal-to s less-than-or-equal-to u colon upper P 0 left-parenthesis s right-parenthesis less-than-or-equal-to upper P 0 left-parenthesis s 0 right-parenthesis Endscripts upper P 0 left-parenthesis s right-parenthesis

Also, a two-sided p-value is computed as the sum of the one-sided p-value and the corresponding area in the opposite tail of the distribution, equidistant from the expected value,

upper P 2 Superscript c Baseline equals upper P 0 left-parenthesis StartAbsoluteValue upper S minus normal upper E 0 left-parenthesis upper S right-parenthesis EndAbsoluteValue greater-than-or-equal-to StartAbsoluteValue s 0 minus normal upper E 0 left-parenthesis upper S right-parenthesis EndAbsoluteValue right-parenthesis
Last updated: December 09, 2022