The FREQ Procedure

Definitions and Notation

A two-way table represents the crosstabulation of row variable X and column variable Y. Let the table row values or levels be denoted by upper X Subscript i, i equals 1 comma 2 comma ellipsis comma upper R, and the column values by upper Y Subscript j, j equals 1 comma 2 comma ellipsis comma upper C. Let n Subscript i j denote the frequency of the table cell in the ith row and jth column and define the following notation:

StartLayout 1st Row 1st Column n Subscript i dot 2nd Column equals sigma-summation Underscript j Endscripts n Subscript i j Baseline 3rd Column Blank 4th Column left-parenthesis row totals right-parenthesis 2nd Row 1st Column n Subscript dot j 2nd Column equals sigma-summation Underscript i Endscripts n Subscript i j Baseline 3rd Column Blank 4th Column left-parenthesis column totals right-parenthesis 3rd Row 1st Column n 2nd Column equals sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline 3rd Column Blank 4th Column left-parenthesis overall total right-parenthesis 4th Row 1st Column p Subscript i j 2nd Column equals n Subscript i j Baseline slash n 3rd Column Blank 4th Column left-parenthesis cell percentages right-parenthesis 5th Row 1st Column p Subscript i dot 2nd Column equals n Subscript i dot Baseline slash n 3rd Column Blank 4th Column left-parenthesis row percentages of total right-parenthesis 6th Row 1st Column p Subscript dot j 2nd Column equals n Subscript dot j Baseline slash n 3rd Column Blank 4th Column left-parenthesis column percentages of total right-parenthesis EndLayout
StartLayout 1st Row 1st Column upper R Subscript i 2nd Column equals score for row i 2nd Row 1st Column upper C Subscript j 2nd Column equals score for column j EndLayout
StartLayout 1st Row 1st Column upper R overbar 2nd Column equals sigma-summation Underscript i Endscripts n Subscript i dot Baseline upper R Subscript i Baseline slash n 3rd Column Blank 4th Column left-parenthesis average row score right-parenthesis 2nd Row 1st Column upper C overbar 2nd Column equals sigma-summation Underscript j Endscripts n Subscript dot j Baseline upper C Subscript j Baseline slash n 3rd Column Blank 4th Column left-parenthesis average column score right-parenthesis EndLayout
StartLayout 1st Row 1st Column upper A Subscript i j 2nd Column equals sigma-summation Underscript k greater-than i Endscripts sigma-summation Underscript l greater-than j Endscripts n Subscript k l Baseline plus sigma-summation Underscript k less-than i Endscripts sigma-summation Underscript l less-than j Endscripts n Subscript k l Baseline 2nd Row 1st Column upper D Subscript i j 2nd Column equals sigma-summation Underscript k greater-than i Endscripts sigma-summation Underscript l less-than j Endscripts n Subscript k l Baseline plus sigma-summation Underscript k less-than i Endscripts sigma-summation Underscript l greater-than j Endscripts n Subscript k l Baseline 3rd Row 1st Column upper P 2nd Column equals sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline upper A Subscript i j Baseline left-parenthesis twice the number of concordances right-parenthesis 4th Row 1st Column upper Q 2nd Column equals sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline upper D Subscript i j Baseline left-parenthesis twice the number of discordances right-parenthesis EndLayout
Scores

PROC FREQ uses scores of the variable values to compute the Mantel-Haenszel chi-square, Pearson correlation, Cochran-Armitage test for trend, weighted kappa coefficient, and Cochran-Mantel-Haenszel statistics. The SCORES= option in the TABLES statement specifies the score type that PROC FREQ uses. The available score types are TABLE, RANK, RIDIT, and MODRIDIT scores. The default score type is TABLE. Using MODRIDIT, RANK, or RIDIT scores yields nonparametric analyses.

For numeric variables, table scores are the values of the row and column levels. If the row or column variable is formatted, then the table score is the internal numeric value corresponding to that level. If two or more numeric values are classified into the same formatted level, then the internal numeric value for that level is the smallest of these values. For character variables, table scores are defined as the row numbers and column numbers (that is, 1 for the first row, 2 for the second row, and so on).

Rank scores, which you request with the SCORES=RANK option, are defined as

StartLayout 1st Row 1st Column upper R Subscript i Superscript 1 2nd Column equals sigma-summation Underscript k less-than i Endscripts n Subscript k dot Baseline plus left-parenthesis n Subscript i dot Baseline plus 1 right-parenthesis slash 2 3rd Column Blank 4th Column i equals 1 comma 2 comma ellipsis comma upper R 2nd Row 1st Column upper C Subscript j Superscript 1 2nd Column equals sigma-summation Underscript l less-than j Endscripts n Subscript dot l Baseline plus left-parenthesis n Subscript dot j Baseline plus 1 right-parenthesis slash 2 3rd Column Blank 4th Column j equals 1 comma 2 comma ellipsis comma upper C EndLayout

where upper R Subscript i Superscript 1 is the rank score of row i, and upper C Subscript j Superscript 1 is the rank score of column j. Note that rank scores yield midranks for tied values.

Ridit scores, which you request with the SCORES=RIDIT option, are defined as rank scores standardized by the sample size (Bross 1958; Mack and Skillings 1980). Ridit scores are derived from the rank scores as

StartLayout 1st Row 1st Column upper R Subscript i Superscript 2 2nd Column equals upper R Subscript i Superscript 1 Baseline slash n 3rd Column Blank 4th Column i equals 1 comma 2 comma ellipsis comma upper R 2nd Row 1st Column upper C Subscript j Superscript 2 2nd Column equals upper C Subscript j Superscript 1 Baseline slash n 3rd Column Blank 4th Column j equals 1 comma 2 comma ellipsis comma upper C EndLayout

Modified ridit scores (SCORES=MODRIDIT) represent the expected values of the order statistics of the uniform distribution on (0,1) (Van Elteren 1960; Lehmann and D’Abrera 2006). Modified ridit scores are derived from rank scores as

StartLayout 1st Row 1st Column upper R Subscript i Superscript 3 2nd Column equals upper R Subscript i Superscript 1 Baseline slash left-parenthesis n plus 1 right-parenthesis 3rd Column Blank 4th Column i equals 1 comma 2 comma ellipsis comma upper R 2nd Row 1st Column upper C Subscript j Superscript 3 2nd Column equals upper C Subscript j Superscript 1 Baseline slash left-parenthesis n plus 1 right-parenthesis 3rd Column Blank 4th Column j equals 1 comma 2 comma ellipsis comma upper C EndLayout
Last updated: December 09, 2022