The POWER Procedure

Analyses in the TWOSAMPLEWILCOXON Statement

Wilcoxon-Mann-Whitney Test for Comparing Two Distributions (TEST=WMW)

The power approximation in this section is applicable to the Wilcoxon-Mann-Whitney (WMW) test as invoked with the WILCOXON option in the PROC NPAR1WAY statement of the NPAR1WAY procedure. The approximation is based on O’Brien and Castelloe (2006) and an estimator called ModifyingAbove normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline With caret. See O’Brien and Castelloe (2006) for a definition of ModifyingAbove normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline With caret, which need not be derived in detail here for purposes of explaining the power formula.

Let upper Y 1 and upper Y 2 be independent observations from any two distributions that you want to compare using the WMW test. For purposes of deriving the asymptotic distribution of ModifyingAbove normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline With caret (and consequently the power computation as well), these distributions must be formulated as ordered categorical ("ordinal") distributions.

If a distribution is continuous, it can be discretized using a large number of categories with negligible loss of accuracy. Each nonordinal distribution is divided into b categories, where b is the value of the NBINS parameter, with breakpoints evenly spaced on the probability scale. That is, each bin contains an equal probability 1/b for that distribution. Then the breakpoints across both distributions are pooled to form a collection of C bins (heretofore called "categories"), and the probabilities of bin membership for each distribution are recalculated. The motivation for this method of binning is to avoid degenerate representations of the distributions—that is, small handfuls of large probabilities among mostly empty bins—as can be caused by something like an evenly spaced grid across raw values rather than probabilities.

After the discretization process just mentioned, there are now two ordinal distributions, each with a set of probabilities across a common set of C ordered categories. For simplicity of notation, assume (without loss of generality) the response values to be 1 comma ellipsis comma upper C. Represent the conditional probabilities as

p overTilde Subscript i j Baseline equals normal upper P normal r normal o normal b left-parenthesis upper Y Subscript i Baseline equals j bar normal g normal r normal o normal u normal p equals i right-parenthesis comma i element-of StartSet 1 comma 2 EndSet and j element-of StartSet 1 comma ellipsis comma upper C EndSet

and the group allocation weights as

w Subscript i Baseline equals StartFraction n Subscript i Baseline Over upper N EndFraction equals normal upper P normal r normal o normal b left-parenthesis normal g normal r normal o normal u normal p equals i right-parenthesis comma i element-of StartSet 1 comma 2 EndSet

The joint probabilities can then be calculated simply as

p Subscript i j Baseline equals normal upper P normal r normal o normal b left-parenthesis normal g normal r normal o normal u normal p equals i comma upper Y Subscript i Baseline equals j right-parenthesis equals w Subscript i Baseline p overTilde Subscript i j Baseline comma i element-of StartSet 1 comma 2 EndSet and j element-of StartSet 1 comma ellipsis comma upper C EndSet

The next step in the power computation is to compute the probabilities that a randomly chosen pair of observations from the two groups is concordant, discordant, or tied. It is useful to define these probabilities as functions of the terms upper R s Subscript i j and upper R d Subscript i j, defined as follows, where Y is a random observation drawn from the joint distribution across groups and categories:

StartLayout 1st Row 1st Column upper R s Subscript i j 2nd Column equals normal upper P normal r normal o normal b left-parenthesis upper Y is concordant with cell left-parenthesis i comma j right-parenthesis right-parenthesis plus one-half normal upper P normal r normal o normal b left-parenthesis upper Y is tied with cell left-parenthesis i comma j right-parenthesis right-parenthesis 2nd Row 1st Column Blank 2nd Column equals normal upper P normal r normal o normal b left-parenthesis left-parenthesis normal g normal r normal o normal u normal p less-than i and upper Y less-than j right-parenthesis or left-parenthesis normal g normal r normal o normal u normal p greater-than i and upper Y greater-than j right-parenthesis right-parenthesis plus 3rd Row 1st Column Blank 2nd Column one-half normal upper P normal r normal o normal b left-parenthesis normal g normal r normal o normal u normal p not-equals i and upper Y equals j right-parenthesis 4th Row 1st Column Blank 2nd Column equals sigma-summation Underscript g equals 1 Overscript 2 Endscripts sigma-summation Underscript c equals 1 Overscript upper C Endscripts w Subscript g Baseline p overTilde Subscript g c Baseline left-bracket normal upper I Subscript left-parenthesis g minus i right-parenthesis left-parenthesis c minus j right-parenthesis greater-than 0 Baseline plus one-half normal upper I Subscript g not-equals i comma c equals j Baseline right-bracket EndLayout

and

StartLayout 1st Row 1st Column upper R d Subscript i j 2nd Column equals normal upper P normal r normal o normal b left-parenthesis upper Y is discordant with cell left-parenthesis i comma j right-parenthesis right-parenthesis plus one-half normal upper P normal r normal o normal b left-parenthesis upper Y is tied with cell left-parenthesis i comma j right-parenthesis right-parenthesis 2nd Row 1st Column Blank 2nd Column equals normal upper P normal r normal o normal b left-parenthesis left-parenthesis normal g normal r normal o normal u normal p less-than i and upper Y greater-than j right-parenthesis or left-parenthesis normal g normal r normal o normal u normal p greater-than i and upper Y less-than j right-parenthesis right-parenthesis plus 3rd Row 1st Column Blank 2nd Column one-half normal upper P normal r normal o normal b left-parenthesis normal g normal r normal o normal u normal p not-equals i and upper Y equals j right-parenthesis 4th Row 1st Column Blank 2nd Column equals sigma-summation Underscript g equals 1 Overscript 2 Endscripts sigma-summation Underscript c equals 1 Overscript upper C Endscripts w Subscript g Baseline p overTilde Subscript g c Baseline left-bracket normal upper I Subscript left-parenthesis g minus i right-parenthesis left-parenthesis c minus j right-parenthesis less-than 0 Baseline plus one-half normal upper I Subscript g not-equals i comma c equals j Baseline right-bracket EndLayout

For an independent random draw upper Y 1 comma upper Y 2 from the two distributions,

StartLayout 1st Row 1st Column upper P Subscript c 2nd Column equals normal upper P normal r normal o normal b left-parenthesis upper Y 1 comma upper Y 2 concordant right-parenthesis plus one-half normal upper P normal r normal o normal b left-parenthesis upper Y 1 comma upper Y 2 tied right-parenthesis 2nd Row 1st Column Blank 2nd Column equals sigma-summation Underscript i equals 1 Overscript 2 Endscripts sigma-summation Underscript j equals 1 Overscript upper C Endscripts w Subscript i Baseline p overTilde Subscript i j Baseline upper R s Subscript i j EndLayout

and

StartLayout 1st Row 1st Column upper P Subscript d 2nd Column equals normal upper P normal r normal o normal b left-parenthesis upper Y 1 comma upper Y 2 discordant right-parenthesis plus one-half normal upper P normal r normal o normal b left-parenthesis upper Y 1 comma upper Y 2 tied right-parenthesis 2nd Row 1st Column Blank 2nd Column equals sigma-summation Underscript i equals 1 Overscript 2 Endscripts sigma-summation Underscript j equals 1 Overscript upper C Endscripts w Subscript i Baseline p overTilde Subscript i j Baseline upper R d Subscript i j EndLayout

Then

normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline equals StartFraction upper P Subscript c Baseline Over upper P Subscript d Baseline EndFraction

Proceeding to compute the theoretical standard error associated with normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s (that is, the population analogue to the sample standard error),

StartLayout 1st Row  normal upper S normal upper E left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis equals StartFraction 2 Over upper P Subscript d Baseline EndFraction left-bracket sigma-summation Underscript i equals 1 Overscript 2 Endscripts sigma-summation Underscript j equals 1 Overscript upper C Endscripts w Subscript i Baseline p overTilde Subscript i j Baseline left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline upper R d Subscript i j Baseline minus upper R s Subscript i j Baseline right-parenthesis squared slash upper N right-bracket Superscript one-half EndLayout

Converting to the natural log scale and using the delta method,

normal upper S normal upper E left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis equals StartFraction normal upper S normal upper E left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis Over normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline EndFraction

The next step is to produce a "smoothed" version of the 2 times upper C cell probabilities that conforms to the null hypothesis of the Wilcoxon-Mann-Whitney test (in other words, independence in the 2 times upper C contingency table of probabilities). Let normal upper S normal upper E Subscript upper H 0 Baseline left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis denote the theoretical standard error of log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis assuming upper H 0.

Finally, compute the power using the noncentral chi-square and normal distributions:

StartLayout 1st Row 1st Column normal p normal o normal w normal e normal r equals 2nd Column StartLayout Enlarged left-brace 1st Row  upper P left-parenthesis upper Z greater-than-or-equal-to StartFraction normal upper S normal upper E Subscript upper H 0 Baseline left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis Over normal upper S normal upper E left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis EndFraction z Subscript 1 minus alpha Baseline minus delta Superscript star Baseline upper N Superscript one-half Baseline right-parenthesis comma upper one hyphen sided 2nd Row  upper P left-parenthesis upper Z less-than-or-equal-to StartFraction normal upper S normal upper E Subscript upper H 0 Baseline left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis Over normal upper S normal upper E left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis EndFraction z Subscript alpha Baseline minus delta Superscript star Baseline upper N Superscript one-half Baseline right-parenthesis comma lower one hyphen sided 3rd Row  upper P left-parenthesis chi squared left-parenthesis 1 comma left-parenthesis delta Superscript star Baseline right-parenthesis squared upper N right-parenthesis greater-than-or-equal-to left-bracket StartFraction normal upper S normal upper E Subscript upper H 0 Baseline left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis Over normal upper S normal upper E left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis EndFraction right-bracket squared chi Subscript 1 minus alpha Superscript 2 Baseline left-parenthesis 1 right-parenthesis right-parenthesis comma two hyphen sided EndLayout EndLayout

where

delta Superscript star Baseline equals StartFraction log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis Over upper N Superscript one-half Baseline normal upper S normal upper E left-parenthesis log left-parenthesis normal upper W normal upper M normal upper W Subscript normal o normal d normal d normal s Baseline right-parenthesis right-parenthesis EndFraction

is the primary noncentrality—that is, the "effect size" that quantifies how much the two conjectured distributions differ. Z is a standard normal random variable, chi squared left-parenthesis d f comma n c right-parenthesis is a noncentral chi squared random variable with degrees of freedom df and noncentrality nc, and N is the total sample size.

Last updated: December 09, 2022