The FREQ Procedure

Measures of Association

When you specify the MEASURES option in the TABLES statement, PROC FREQ computes several statistics that describe the association between the row and column variables of the contingency table. The following are measures of ordinal association that consider whether the column variable Y tends to increase as the row variable X increases: gamma, Kendall’s tau-b, Stuart’s tau-c, and Somers’ D. These measures are appropriate for ordinal variables, and they classify pairs of observations as concordant or discordant. A pair is concordant if the observation with the larger value of X also has the larger value of Y. A pair is discordant if the observation with the larger value of X has the smaller value of Y. See Agresti (2007) and the other references cited for the individual measures of association.

The Pearson correlation coefficient and the Spearman rank correlation coefficient are also appropriate for ordinal variables. The Pearson correlation describes the strength of the linear association between the row and column variables, and it is computed by using the row and column scores specified by the SCORES= option in the TABLES statement. The Spearman correlation is computed with rank scores. The polychoric correlation (requested by the PLCORR option) also requires ordinal variables and assumes that the variables have an underlying bivariate normal distribution. The following measures of association do not require ordinal variables and are appropriate for nominal variables: lambda asymmetric, lambda symmetric, and the uncertainty coefficients.

PROC FREQ computes estimates of the measures according to the formulas given in the following sections. For each measure, PROC FREQ computes an asymptotic standard error (ASE), which is the square root of the asymptotic variance denoted by Var in the following sections.

Confidence Limits

If you specify the CL option in the TABLES statement, PROC FREQ computes asymptotic confidence limits for all MEASURES statistics. The confidence coefficient is determined according to the value of the ALPHA= option, which, by default, is 0.05 and produces 95% confidence limits.

The confidence limits are computed as

normal upper E normal s normal t plus-or-minus left-parenthesis z Subscript alpha slash 2 Baseline times normal upper A normal upper S normal upper E right-parenthesis

where Est is the estimate of the measure, z Subscript alpha slash 2 is the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the standard normal distribution, and ASE is the asymptotic standard error of the estimate.

Asymptotic Tests

For each measure that you specify in the TEST statement, PROC FREQ computes an asymptotic test of the null hypothesis that the measure is 0. Asymptotic tests are available for the following measures of association: gamma, Kendall’s tau-b, Stuart’s tau-c, Somers’ upper D left-parenthesis upper C vertical-bar upper R right-parenthesis, Somers’ upper D left-parenthesis upper R vertical-bar upper C right-parenthesis, the Pearson correlation coefficient, and the Spearman rank correlation coefficient. To compute an asymptotic test, PROC FREQ uses a standardized test statistic z, which has an asymptotic standard normal distribution under the null hypothesis. The test statistic is computed as

z equals normal upper E normal s normal t slash StartRoot normal upper V normal a normal r Subscript 0 Baseline left-parenthesis normal upper E normal s normal t right-parenthesis EndRoot

where Est is the estimate of the measure and normal upper V normal a normal r Subscript 0 Baseline left-parenthesis normal upper E normal s normal t right-parenthesis is the variance of the estimate under the null hypothesis. Formulas for normal upper V normal a normal r Subscript 0 Baseline left-parenthesis normal upper E normal s normal t right-parenthesis for the individual measures of association are given in the following sections.

Note that the ratio of Est to StartRoot normal upper V normal a normal r Subscript 0 Baseline left-parenthesis normal upper E normal s normal t right-parenthesis EndRoot is the same for the following measures: gamma, Kendall’s tau-b, Stuart’s tau-c, Somers’ upper D left-parenthesis upper C vertical-bar upper R right-parenthesis, and Somers’ upper D left-parenthesis upper R vertical-bar upper C right-parenthesis. Therefore, the tests for these measures are identical. For example, the p-values for the test of upper H 0 colon normal g normal a normal m normal m normal a equals 0 equal the p-values for the test of upper H 0 colon normal t normal a normal u minus b equals 0.

PROC FREQ computes one-sided and two-sided p-values for each of these tests. When the test statistic z is greater than its null hypothesis expected value of 0, PROC FREQ displays the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. A small right-sided p-value supports the alternative hypothesis that the true value of the measure is greater than 0. When the test statistic is less than or equal to 0, PROC FREQ displays the left-sided p-value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. A small left-sided p-value supports the alternative hypothesis that the true value of the measure is less than 0. The one-sided p-value upper P 1 can be expressed as

upper P 1 equals StartLayout Enlarged left-brace 1st Row  normal upper P normal r normal o normal b left-parenthesis upper Z greater-than z right-parenthesis normal i normal f z greater-than 0 2nd Row  normal upper P normal r normal o normal b left-parenthesis upper Z less-than z right-parenthesis normal i normal f z less-than-or-equal-to 0 EndLayout

where Z has a standard normal distribution. The two-sided p-value upper P 2 is computed as

upper P 2 equals normal upper P normal r normal o normal b left-parenthesis StartAbsoluteValue upper Z EndAbsoluteValue greater-than StartAbsoluteValue z EndAbsoluteValue right-parenthesis
Exact Tests

Exact tests are available for the following measures of association: Kendall’s tau-b, Stuart’s tau-c, Somers’ upper D left-parenthesis upper C vertical-bar upper R right-parenthesis and left-parenthesis upper R vertical-bar upper C right-parenthesis, the Pearson correlation coefficient, and the Spearman rank correlation coefficient. If you request an exact test for a measure of association in the EXACT statement, PROC FREQ computes the exact test of the hypothesis that the measure is 0. For more information, see the section Exact Statistics.

Gamma

The gamma (normal upper Gamma) statistic is based only on the number of concordant and discordant pairs of observations. It ignores tied pairs (that is, pairs of observations that have equal values of X or equal values of Y). Gamma is appropriate only when both variables lie on an ordinal scale. The range of gamma is negative 1 less-than-or-equal-to normal upper Gamma less-than-or-equal-to 1. If the row and column variables are independent, gamma tends to be close to 0. Gamma is computed as

upper G equals left-parenthesis upper P minus upper Q right-parenthesis slash left-parenthesis upper P plus upper Q right-parenthesis

and the asymptotic variance is

normal upper V normal a normal r left-parenthesis upper G right-parenthesis equals StartFraction 16 Over left-parenthesis upper P plus upper Q right-parenthesis Superscript 4 Baseline EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper Q upper A Subscript i j Baseline minus upper P upper D Subscript i j Baseline right-parenthesis squared

For 2 times 2 tables, gamma is equivalent to Yule’s Q. See Goodman and Kruskal (1979) and Agresti (2002) for more information.

The variance under the null hypothesis that gamma equals 0 is computed as

normal upper V normal a normal r Subscript 0 Baseline left-parenthesis upper G right-parenthesis equals StartFraction 4 Over left-parenthesis upper P plus upper Q right-parenthesis squared EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper A Subscript i j Baseline minus upper D Subscript i j Baseline right-parenthesis squared minus left-parenthesis upper P minus upper Q right-parenthesis squared slash n right-parenthesis

For more information, see Brown and Benedetti (1977b).

Kendall’s Tau-b

Kendall’s tau-b (tau Subscript b) is similar to gamma except that tau-b uses a correction for ties. Tau-b is appropriate only when both variables lie on an ordinal scale. The range of tau-b is negative 1 less-than-or-equal-to tau Subscript b Baseline less-than-or-equal-to 1. Kendall’s tau-b is computed as

t Subscript b Baseline equals left-parenthesis upper P minus upper Q right-parenthesis slash StartRoot w Subscript r Baseline w Subscript c Baseline EndRoot

and the asymptotic variance is

normal upper V normal a normal r left-parenthesis t Subscript b Baseline right-parenthesis equals StartFraction 1 Over w Superscript 4 Baseline EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis 2 w d Subscript i j Baseline plus t Subscript b Baseline v Subscript i j Baseline right-parenthesis squared minus n cubed t Subscript b Superscript 2 Baseline left-parenthesis w Subscript r Baseline plus w Subscript c Baseline right-parenthesis squared right-parenthesis

where

StartLayout 1st Row 1st Column w 2nd Column equals 3rd Column StartRoot w Subscript r Baseline w Subscript c Baseline EndRoot 2nd Row 1st Column w Subscript r 2nd Column equals 3rd Column n squared minus sigma-summation Underscript i Endscripts n Subscript i dot Superscript 2 3rd Row 1st Column w Subscript c 2nd Column equals 3rd Column n squared minus sigma-summation Underscript j Endscripts n Subscript dot j Superscript 2 4th Row 1st Column d Subscript i j 2nd Column equals 3rd Column upper A Subscript i j Baseline minus upper D Subscript i j 5th Row 1st Column v Subscript i j 2nd Column equals 3rd Column n Subscript i dot Baseline w Subscript c plus n Subscript dot j Baseline w Subscript r EndLayout

See Kendall (1955) for more information.

The variance under the null hypothesis that tau-b equals 0 is computed as

normal upper V normal a normal r Subscript 0 Baseline left-parenthesis t Subscript b Baseline right-parenthesis equals StartFraction 4 Over w Subscript r Baseline w Subscript c Baseline EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper A Subscript i j Baseline minus upper D Subscript i j Baseline right-parenthesis squared minus left-parenthesis upper P minus upper Q right-parenthesis squared slash n right-parenthesis

For more information, see Brown and Benedetti (1977b).

PROC FREQ also provides an exact test for the Kendall’s tau-b. You can request this test by specifying the KENTB option in the EXACT statement. See the section Exact Statistics for more information.

Stuart’s Tau-c

Stuart’s tau-c (tau Subscript c) makes an adjustment for table size in addition to a correction for ties. Tau-c is appropriate only when both variables lie on an ordinal scale. The range of tau-c is negative 1 less-than-or-equal-to tau Subscript c Baseline less-than-or-equal-to 1. Stuart’s tau-c is computed as

t Subscript c Baseline equals m left-parenthesis upper P minus upper Q right-parenthesis slash n squared left-parenthesis m minus 1 right-parenthesis

and the asymptotic variance is

normal upper V normal a normal r left-parenthesis t Subscript c Baseline right-parenthesis equals StartFraction 4 m squared Over left-parenthesis m minus 1 right-parenthesis squared n Superscript 4 Baseline EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline d Subscript i j Superscript 2 Baseline minus left-parenthesis upper P minus upper Q right-parenthesis squared slash n right-parenthesis

where m equals min left-parenthesis upper R comma upper C right-parenthesis and d Subscript i j Baseline equals upper A Subscript i j Baseline minus upper D Subscript i j. The variance under the null hypothesis that tau-c equals 0 is the same as the asymptotic variance

normal upper V normal a normal r Subscript 0 Baseline left-parenthesis t Subscript c Baseline right-parenthesis equals normal upper V normal a normal r left-parenthesis t Subscript c Baseline right-parenthesis

For more information, see Brown and Benedetti (1977b).

PROC FREQ also provides an exact test for the Stuart’s tau-c. You can request this test by specifying the STUTC option in the EXACT statement. See the section Exact Statistics for more information.

Somers’ D

Somers’ upper D left-parenthesis upper C vertical-bar upper R right-parenthesis and Somers’ upper D left-parenthesis upper R vertical-bar upper C right-parenthesis are asymmetric modifications of tau-b. upper C vertical-bar upper R indicates that the row variable X is regarded as the independent variable and the column variable Y is regarded as dependent. Similarly, upper R vertical-bar upper C indicates that the column variable Y is regarded as the independent variable and the row variable X is regarded as dependent. Somers’ D differs from tau-b in that it uses a correction only for pairs that are tied on the independent variable. Somers’ D is appropriate only when both variables lie on an ordinal scale. The range of Somers’ D is negative 1 less-than-or-equal-to upper D less-than-or-equal-to 1. Somers’ upper D left-parenthesis upper C vertical-bar upper R right-parenthesis is computed as

upper D left-parenthesis upper C vertical-bar upper R right-parenthesis equals left-parenthesis upper P minus upper Q right-parenthesis slash w Subscript r Baseline

and its asymptotic variance is

normal upper V normal a normal r left-parenthesis upper D left-parenthesis upper C vertical-bar upper R right-parenthesis right-parenthesis equals StartFraction 4 Over w Subscript r Superscript 4 Baseline EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis w Subscript r Baseline d Subscript i j Baseline minus left-parenthesis upper P minus upper Q right-parenthesis left-parenthesis n minus n Subscript i dot Baseline right-parenthesis right-parenthesis squared

where d Subscript i j Baseline equals upper A Subscript i j Baseline minus upper D Subscript i j and

w Subscript r Baseline equals n squared minus sigma-summation Underscript i Endscripts n Subscript i dot Superscript 2

For more information, see Somers (1962); Goodman and Kruskal (1979); Liebetrau (1983).

The variance under the null hypothesis that upper D left-parenthesis upper C vertical-bar upper R right-parenthesis equals 0 is computed as

normal upper V normal a normal r Subscript 0 Baseline left-parenthesis upper D left-parenthesis upper C vertical-bar upper R right-parenthesis right-parenthesis equals StartFraction 4 Over w Subscript r Superscript 2 Baseline EndFraction left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper A Subscript i j Baseline minus upper D Subscript i j Baseline right-parenthesis squared minus left-parenthesis upper P minus upper Q right-parenthesis squared slash n right-parenthesis

For more information, see Brown and Benedetti (1977b).

Formulas for Somers’ upper D left-parenthesis upper R vertical-bar upper C right-parenthesis are obtained by interchanging the indices.

PROC FREQ also provides exact tests for Somers’ upper D left-parenthesis upper C vertical-bar upper R right-parenthesis and left-parenthesis upper R vertical-bar upper C right-parenthesis. You can request these tests by specifying the SMDCR and SMDCR options in the EXACT statement. See the section Exact Statistics for more information.

Pearson Correlation Coefficient

The Pearson correlation coefficient (rho) is computed by using the scores specified in the SCORES= option. This measure is appropriate only when both variables lie on an ordinal scale. The range of the Pearson correlation is negative 1 less-than-or-equal-to rho less-than-or-equal-to 1. The Pearson correlation coefficient is computed as

r equals v slash w equals s Subscript r c Baseline slash StartRoot s Subscript r Baseline s Subscript c Baseline EndRoot

and its asymptotic variance is

normal upper V normal a normal r left-parenthesis r right-parenthesis equals StartFraction 1 Over w Superscript 4 Baseline EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis w left-parenthesis upper R Subscript i Baseline minus upper R overbar right-parenthesis left-parenthesis upper C Subscript j Baseline minus upper C overbar right-parenthesis minus StartFraction b Subscript i j Baseline v Over 2 w EndFraction right-parenthesis squared

where upper R Subscript i and upper C Subscript j are the row and column scores and

StartLayout 1st Row 1st Column s Subscript r 2nd Column equals 3rd Column sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper R Subscript i Baseline minus upper R overbar right-parenthesis squared 2nd Row 1st Column s Subscript c 2nd Column equals 3rd Column sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper C Subscript j Baseline minus upper C overbar right-parenthesis squared 3rd Row 1st Column s Subscript r c 2nd Column equals 3rd Column sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper R Subscript i Baseline minus upper R overbar right-parenthesis left-parenthesis upper C Subscript j Baseline minus upper C overbar right-parenthesis EndLayout
StartLayout 1st Row 1st Column b Subscript i j 2nd Column equals 3rd Column left-parenthesis upper R Subscript i Baseline minus upper R overbar right-parenthesis squared s Subscript c plus left-parenthesis upper C Subscript j Baseline minus upper C overbar right-parenthesis squared s Subscript r 2nd Row 1st Column v 2nd Column equals 3rd Column s Subscript r c 3rd Row 1st Column w 2nd Column equals 3rd Column StartRoot s Subscript r Baseline s Subscript c Baseline EndRoot EndLayout

For more information, see Snedecor and Cochran (1989).

The SCORES= option in the TABLES statement determines the type of row and column scores used to compute the Pearson correlation (and other score-based statistics). The default is SCORES=TABLE. See the section Scores for details about the available score types and how they are computed.

The variance under the null hypothesis that the correlation equals 0 is computed as

normal upper V normal a normal r Subscript 0 Baseline left-parenthesis r right-parenthesis equals left-parenthesis sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper R Subscript i Baseline minus upper R overbar right-parenthesis squared left-parenthesis upper C Subscript j Baseline minus upper C overbar right-parenthesis squared minus s Subscript r c Superscript 2 Baseline slash n right-parenthesis slash s Subscript r Baseline s Subscript c Baseline

This expression for the variance is derived for multinomial sampling in a contingency table framework, and it differs from the form obtained under the assumption that both variables are continuous and normally distributed. For more information, see Brown and Benedetti (1977b).

PROC FREQ also provides an exact test for the Pearson correlation coefficient. You can request this test by specifying the PCORR option in the EXACT statement. See the section Exact Statistics for more information.

Spearman Rank Correlation Coefficient

The Spearman correlation coefficient (rho Subscript s) is computed by using rank scores, which are defined in the section Scores. This measure is appropriate only when both variables lie on an ordinal scale. The range of the Spearman correlation is negative 1 less-than-or-equal-to rho Subscript s Baseline less-than-or-equal-to 1. The Spearman correlation coefficient is computed as

r Subscript s Baseline equals v slash w

and its asymptotic variance is

normal upper V normal a normal r left-parenthesis r Subscript s Baseline right-parenthesis equals StartFraction 1 Over n squared w Superscript 4 Baseline EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis z Subscript i j Baseline minus z overbar right-parenthesis squared

where upper R Subscript i and upper C Subscript j are the row and column rank scores and

StartLayout 1st Row 1st Column v 2nd Column equals 3rd Column sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline upper R left-parenthesis i right-parenthesis upper C left-parenthesis j right-parenthesis 2nd Row 1st Column w 2nd Column equals 3rd Column one-ttwelfth StartRoot upper F upper G EndRoot 3rd Row 1st Column upper F 2nd Column equals 3rd Column n cubed minus sigma-summation Underscript i Endscripts n Subscript i dot Superscript 3 4th Row 1st Column upper G 2nd Column equals 3rd Column n cubed minus sigma-summation Underscript j Endscripts n Subscript dot j Superscript 3 5th Row 1st Column upper R left-parenthesis i right-parenthesis 2nd Column equals 3rd Column upper R Subscript i Baseline minus n slash 2 6th Row 1st Column upper C left-parenthesis j right-parenthesis 2nd Column equals 3rd Column upper C Subscript j Baseline minus n slash 2 7th Row 1st Column z overbar 2nd Column equals 3rd Column StartFraction 1 Over n EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline z Subscript i j 8th Row 1st Column z Subscript i j 2nd Column equals 3rd Column w v Subscript i j minus v w Subscript i j EndLayout
StartLayout 1st Row 1st Column v Subscript i j 2nd Column equals 3rd Column n left-parenthesis upper R left-parenthesis i right-parenthesis upper C left-parenthesis j right-parenthesis plus one-half sigma-summation Underscript l Endscripts n Subscript i l Baseline upper C left-parenthesis l right-parenthesis plus one-half sigma-summation Underscript k Endscripts n Subscript k j Baseline upper R left-parenthesis k right-parenthesis plus 2nd Row 1st Column Blank 2nd Column Blank 3rd Column sigma-summation Underscript l Endscripts sigma-summation Underscript k greater-than i Endscripts n Subscript k l Baseline upper C left-parenthesis l right-parenthesis plus sigma-summation Underscript k Endscripts sigma-summation Underscript l greater-than j Endscripts n Subscript k l Baseline upper R left-parenthesis k right-parenthesis right-parenthesis 3rd Row 1st Column w Subscript i j 2nd Column equals 3rd Column StartFraction negative n Over 96 w EndFraction left-parenthesis upper F n Subscript dot j Superscript 2 Baseline plus upper G n Subscript i dot Superscript 2 Baseline right-parenthesis EndLayout

For more information, see Snedecor and Cochran (1989).

The variance under the null hypothesis that the correlation equals 0 is computed as

normal upper V normal a normal r Subscript 0 Baseline left-parenthesis r Subscript s Baseline right-parenthesis equals StartFraction 1 Over n squared w squared EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis v Subscript i j Baseline minus v overbar right-parenthesis squared

where

v overbar equals sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline v Subscript i j Baseline slash n

This expression for the variance is derived for multinomial sampling in a contingency table framework, and it differs from the form obtained under the assumption that both variables are continuous and normally distributed. For more information, see Brown and Benedetti (1977b).

PROC FREQ also provides an exact test for the Spearman correlation coefficient. You can request this test by specifying the SCORR option in the EXACT statement. For more information, see the section Exact Statistics.

Polychoric Correlation

When you specify the PLCORR option in the TABLES statement, PROC FREQ computes the polychoric correlation and its standard error. The polychoric correlation is based on the assumption that the two ordinal, categorical variables of the frequency table have an underlying bivariate normal distribution. The polychoric correlation coefficient is the maximum likelihood estimate of the product-moment correlation between the underlying normal variables. The range of the polychoric correlation is from –1 to 1. For 2 times 2 tables, the polychoric correlation is also known as the tetrachoric correlation (and it is labeled as such in the displayed output). See Drasgow (1986) for an overview of polychoric correlation coefficient.

Olsson (1979) gives the likelihood equations and the asymptotic standard errors for estimating the polychoric correlation. The underlying continuous variables relate to the observed crosstabulation table through thresholds, which define a range of numeric values that correspond to each categorical (table) level. PROC FREQ uses Olsson’s maximum likelihood method for simultaneous estimation of the polychoric correlation and the thresholds. (Olsson also presents a two-step method that estimates the thresholds first.)

PROC FREQ iteratively solves the likelihood equations by using a Newton-Raphson algorithm. The initial estimates of the thresholds are computed from the inverse of the normal distribution function at the cumulative marginal proportions of the table. Iterative computation of the polychoric correlation stops when the convergence measure falls below the convergence criterion or when the maximum number of iterations is reached, whichever occurs first. For parameter values that are less than 0.01, the procedure evaluates convergence by using the absolute difference instead of the relative difference. The PLCORR(CONVERGE=) option specifies the convergence criterion, which is 0.0001 by default. The PLCORR(MAXITER=) option specifies the maximum number of iterations, which is 20 by default.

If you specify the CL option in the TABLES statement, PROC FREQ provides confidence limits for the polychoric correlation. The confidence limits are computed as

ModifyingAbove rho With caret plus-or-minus left-parenthesis z Subscript alpha slash 2 Baseline times normal upper S normal upper E left-parenthesis ModifyingAbove rho With caret right-parenthesis right-parenthesis

where ModifyingAbove rho With caret is the estimate of the polychoric correlation, z Subscript alpha slash 2 is the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the standard normal distribution, and normal upper S normal upper E left-parenthesis ModifyingAbove rho With caret right-parenthesis is the standard error of the polychoric correlation estimate.

If you specify the PLCORR option in the TEST statement, PROC FREQ provides Wald and likelihood ratio tests of the null hypothesis that the polychoric correlation is 0. The Wald test statistic is computed as

z equals ModifyingAbove rho With caret slash normal upper S normal upper E left-parenthesis ModifyingAbove rho With caret right-parenthesis

which has a standard normal distribution under the null hypothesis. PROC FREQ computes one-sided and two-sided p-values for the Wald test. When the test statistic z is greater than its null expected value of 0, PROC FREQ displays the right-sided p-value. When the test statistic is less than or equal to 0, PROC FREQ displays the left-sided p-value.

The likelihood ratio statistic for the polychoric correlation is computed as

upper G squared equals negative 2 log left-parenthesis upper L 0 slash upper L 1 right-parenthesis

where upper L 0 is the value of the likelihood function (Olsson 1979) when the polychoric correlation is 0, and upper L 1 is the value of the likelihood function at the maximum (where all parameters are replaced by their maximum likelihood estimates). Under the null hypothesis, the likelihood ratio statistic has an asymptotic chi-square distribution with 1 degree of freedom.

Lambda (Asymmetric)

Asymmetric lambda, lamda left-parenthesis upper C vertical-bar upper R right-parenthesis, is interpreted as the probable improvement in predicting the column variable Y given knowledge of the row variable X. The range of asymmetric lambda is 0 less-than-or-equal-to lamda left-parenthesis upper C vertical-bar upper R right-parenthesis less-than-or-equal-to 1. Asymmetric lambda (upper C vertical-bar upper R) is computed as

lamda left-parenthesis upper C vertical-bar upper R right-parenthesis equals StartFraction sigma-summation Underscript i Endscripts r Subscript i Baseline minus r Over n minus r EndFraction

and its asymptotic variance is

normal upper V normal a normal r left-parenthesis lamda left-parenthesis upper C vertical-bar upper R right-parenthesis right-parenthesis equals StartFraction n minus sigma-summation Underscript i Endscripts r Subscript i Baseline Over left-parenthesis n minus r right-parenthesis cubed EndFraction left-parenthesis sigma-summation Underscript i Endscripts r Subscript i Baseline plus r minus 2 sigma-summation Underscript i Endscripts left-parenthesis r Subscript i Baseline vertical-bar l Subscript i Baseline equals l right-parenthesis right-parenthesis

where

StartLayout 1st Row 1st Column r Subscript i 2nd Column equals 3rd Column max Underscript j Endscripts left-parenthesis n Subscript i j Baseline right-parenthesis 2nd Row 1st Column r 2nd Column equals 3rd Column max Underscript j Endscripts left-parenthesis n Subscript dot j Baseline right-parenthesis 3rd Row 1st Column c Subscript j 2nd Column equals 3rd Column max Underscript i Endscripts left-parenthesis n Subscript i j Baseline right-parenthesis 4th Row 1st Column c 2nd Column equals 3rd Column max Underscript i Endscripts left-parenthesis n Subscript i dot Baseline right-parenthesis EndLayout

The values of l Subscript i and l are determined as follows. Denote by l Subscript i the unique value of j such that r Subscript i Baseline equals n Subscript i j, and let l be the unique value of j such that r equals n Subscript dot j. Because of the uniqueness assumptions, ties in the frequencies or in the marginal totals must be broken in an arbitrary but consistent manner. In case of ties, l is defined as the smallest value of j such that r equals n Subscript dot j.

For those columns containing a cell (i, j) for which n Subscript i j Baseline equals r Subscript i Baseline equals c Subscript j, c s Subscript j records the row in which c Subscript j is assumed to occur. Initially c s Subscript j is set equal to –1 for all j. Beginning with i=1, if there is at least one value j such that n Subscript i j Baseline equals r Subscript i Baseline equals c Subscript j, and if c s Subscript j Baseline equals negative 1, l Subscript i is defined to be the smallest such value of j, and c s Subscript j is set equal to i. Otherwise, if n Subscript i l Baseline equals r Subscript i, l Subscript i is defined to be equal to l. If neither condition is true, l Subscript i is taken to be the smallest value of j such that n Subscript i j Baseline equals r Subscript i.

The formulas for lambda asymmetric left-parenthesis upper R vertical-bar upper C right-parenthesis can be obtained by interchanging the indices.

For more information, see Goodman and Kruskal (1979).

Lambda (Symmetric)

The nondirectional lambda is the average of the two asymmetric lambdas, lamda left-parenthesis upper C vertical-bar upper R right-parenthesis and lamda left-parenthesis upper R vertical-bar upper C right-parenthesis. Its range is 0 less-than-or-equal-to lamda less-than-or-equal-to 1. Lambda symmetric is computed as

lamda equals StartFraction sigma-summation Underscript i Endscripts r Subscript i Baseline plus sigma-summation Underscript j Endscripts c Subscript j Baseline minus r minus c Over 2 n minus r minus c EndFraction equals StartFraction w minus v Over w EndFraction

and its asymptotic variance is computed as

normal upper V normal a normal r left-parenthesis lamda right-parenthesis equals StartFraction 1 Over w Superscript 4 Baseline EndFraction left-parenthesis w v y minus 2 w squared left-parenthesis n minus sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts left-parenthesis n Subscript i j Baseline vertical-bar j equals l Subscript i Baseline comma i equals k Subscript j Baseline right-parenthesis right-parenthesis minus 2 v squared left-parenthesis n minus n Subscript k l Baseline right-parenthesis right-parenthesis

where

StartLayout 1st Row 1st Column r Subscript i 2nd Column equals 3rd Column max Underscript j Endscripts left-parenthesis n Subscript i j Baseline right-parenthesis 2nd Row 1st Column r 2nd Column equals 3rd Column max Underscript j Endscripts left-parenthesis n Subscript dot j Baseline right-parenthesis 3rd Row 1st Column c Subscript j 2nd Column equals 3rd Column max Underscript i Endscripts left-parenthesis n Subscript i j Baseline right-parenthesis 4th Row 1st Column c 2nd Column equals 3rd Column max Underscript i Endscripts left-parenthesis n Subscript i dot Baseline right-parenthesis 5th Row 1st Column w 2nd Column equals 3rd Column 2 n minus r minus c 6th Row 1st Column v 2nd Column equals 3rd Column 2 n minus sigma-summation Underscript i Endscripts r Subscript i minus sigma-summation Underscript j Endscripts c Subscript j 7th Row 1st Column x 2nd Column equals 3rd Column sigma-summation Underscript i Endscripts left-parenthesis r Subscript i Baseline vertical-bar l Subscript i Baseline equals l right-parenthesis plus sigma-summation Underscript j Endscripts left-parenthesis c Subscript j Baseline vertical-bar k Subscript j Baseline equals k right-parenthesis plus r Subscript k Baseline plus c Subscript l Baseline 8th Row 1st Column y 2nd Column equals 3rd Column 8 n minus w minus v minus 2 x EndLayout

The definitions of l Subscript i and l are given in the previous section. The values k Subscript j and k are defined in a similar way for lambda asymmetric (upper R vertical-bar upper C).

For more information, see Goodman and Kruskal (1979).

Uncertainty Coefficients (Asymmetric)

The uncertainty coefficient upper U left-parenthesis upper C vertical-bar upper R right-parenthesis measures the proportion of uncertainty (entropy) in the column variable Y that is explained by the row variable X. Its range is 0 less-than-or-equal-to upper U left-parenthesis upper C vertical-bar upper R right-parenthesis less-than-or-equal-to 1. The uncertainty coefficient is computed as

upper U left-parenthesis upper C vertical-bar upper R right-parenthesis equals left-parenthesis upper H left-parenthesis upper X right-parenthesis plus upper H left-parenthesis upper Y right-parenthesis minus upper H left-parenthesis upper X upper Y right-parenthesis right-parenthesis slash upper H left-parenthesis upper Y right-parenthesis equals v slash w

and its asymptotic variance is

normal upper V normal a normal r left-parenthesis upper U left-parenthesis upper C vertical-bar upper R right-parenthesis right-parenthesis equals StartFraction 1 Over n squared w Superscript 4 Baseline EndFraction sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts n Subscript i j Baseline left-parenthesis upper H left-parenthesis upper Y right-parenthesis log left-parenthesis StartFraction n Subscript i j Baseline Over n Subscript i dot Baseline EndFraction right-parenthesis plus left-parenthesis upper H left-parenthesis upper X right-parenthesis minus upper H left-parenthesis upper X upper Y right-parenthesis right-parenthesis log left-parenthesis StartFraction n Subscript dot j Baseline Over n EndFraction right-parenthesis right-parenthesis squared

where

StartLayout 1st Row 1st Column v 2nd Column equals 3rd Column upper H left-parenthesis upper X right-parenthesis plus upper H left-parenthesis upper Y right-parenthesis minus upper H left-parenthesis upper X upper Y right-parenthesis 2nd Row 1st Column w 2nd Column equals 3rd Column upper H left-parenthesis upper Y right-parenthesis 3rd Row 1st Column upper H left-parenthesis upper X right-parenthesis 2nd Column equals 3rd Column minus sigma-summation Underscript i Endscripts left-parenthesis StartFraction n Subscript i dot Baseline Over n EndFraction right-parenthesis log left-parenthesis StartFraction n Subscript i dot Baseline Over n EndFraction right-parenthesis 4th Row 1st Column upper H left-parenthesis upper Y right-parenthesis 2nd Column equals 3rd Column minus sigma-summation Underscript j Endscripts left-parenthesis StartFraction n Subscript dot j Baseline Over n EndFraction right-parenthesis log left-parenthesis StartFraction n Subscript dot j Baseline Over n EndFraction right-parenthesis 5th Row 1st Column upper H left-parenthesis upper X upper Y right-parenthesis 2nd Column equals 3rd Column minus sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts left-parenthesis StartFraction n Subscript i j Baseline Over n EndFraction right-parenthesis log left-parenthesis StartFraction n Subscript i j Baseline Over n EndFraction right-parenthesis EndLayout

The formulas for the uncertainty coefficient upper U left-parenthesis upper R vertical-bar upper C right-parenthesis can be obtained by interchanging the indices.

For more information, see Theil (1972, pp. 115–120) and Goodman and Kruskal (1979).

Uncertainty Coefficient (Symmetric)

The uncertainty coefficient U is the symmetric version of the two asymmetric uncertainty coefficients. Its range is 0 less-than-or-equal-to upper U less-than-or-equal-to 1. The uncertainty coefficient is computed as

upper U equals 2 left-parenthesis upper H left-parenthesis upper X right-parenthesis plus upper H left-parenthesis upper Y right-parenthesis minus upper H left-parenthesis upper X upper Y right-parenthesis right-parenthesis slash left-parenthesis upper H left-parenthesis upper X right-parenthesis plus upper H left-parenthesis upper Y right-parenthesis right-parenthesis

and its asymptotic variance is

normal upper V normal a normal r left-parenthesis upper U right-parenthesis equals 4 sigma-summation Underscript i Endscripts sigma-summation Underscript j Endscripts StartFraction n Subscript i j Baseline left-parenthesis upper H left-parenthesis upper X upper Y right-parenthesis log left-parenthesis StartFraction n Subscript i dot Baseline n Subscript dot j Baseline Over n squared EndFraction right-parenthesis minus left-parenthesis upper H left-parenthesis upper X right-parenthesis plus upper H left-parenthesis upper Y right-parenthesis right-parenthesis log left-parenthesis StartFraction n Subscript i j Baseline Over n EndFraction right-parenthesis right-parenthesis squared Over n squared left-parenthesis upper H left-parenthesis upper X right-parenthesis plus upper H left-parenthesis upper Y right-parenthesis right-parenthesis Superscript 4 Baseline EndFraction

where upper H left-parenthesis upper X right-parenthesis, upper H left-parenthesis upper Y right-parenthesis, and upper H left-parenthesis upper X upper Y right-parenthesis are defined in the previous section. For more information, see Goodman and Kruskal (1979).

Last updated: December 09, 2022