The NPAR1WAY Procedure

Hodges-Lehmann Estimation of Location Shift

If you specify the HL option, PROC NPAR1WAY computes the Hodges-Lehmann estimate of location shift for two-sample data. This option also provides asymptotic confidence limits for the location shift (which are sometimes known as Moses confidence limits). You can specify the confidence level in the ALPHA= option in the PROC NPAR1WAY statement. By default, ALPHA=0.05, which produces 95% confidence limits. Additionally, you can request exact confidence limits for the location shift by specifying the HL option in the EXACT statement.

The Hodges-Lehmann estimator of location shift is associated with the Wilcoxon linear rank statistic. For more information, see Hollander and Wolfe (1999) and Hodges and Lehmann (1983).

PROC NPAR1WAY computes the Hodges-Lehmann estimate ModifyingAbove normal upper Delta With caret as the median of all paired differences between observations in the two samples (classes), which can be written as

ModifyingAbove normal upper Delta With caret equals normal m normal e normal d normal i normal a normal n left-parenthesis left-parenthesis upper Y Subscript j Baseline minus upper X Subscript i Baseline right-parenthesis normal w normal h normal e normal r normal e j equals 1 comma 2 comma ellipsis comma n 1 semicolon i equals 1 comma 2 comma ellipsis comma n 2 right-parenthesis

The upper Y Subscript j are observations in class 1, the upper X Subscript i are observations in class 2, and n 1 and n 2 denote the number of observations in class 1 and class 2, respectively.

By default, PROC NPAR1WAY uses the larger of the two classes as the reference class X (class 2 in the Hodges-Lehmann difference). If both classes have the same number of observations, PROC NPAR1WAY uses the class that appears second in the "Wilcoxon Scores" table as the reference class. (By default, the table displays class levels in the order in which they appear in the input data set. If you specify the ORDER=FORMATTED option, the table displays class levels in order of their formatted value.)

You can specify the reference class by using the HL(REFCLASS=) option. REFCLASS=1 specifies the first class that is listed in the "Wilcoxon Scores" table, and REFCLASS=2 specifies the second class. REFCLASS='class-value' identifies the reference class by the formatted value of the CLASS variable.

Let m denote the total number of differences (n 1 times n 2), and let upper U Superscript left-parenthesis k right-parenthesis denote the kth value of left-parenthesis upper Y Subscript j Baseline minus upper X Subscript i Baseline right-parenthesis among the ordered differences. When m is an odd number, the median difference is the value that has rank left-parenthesis m plus 1 right-parenthesis slash 2,

ModifyingAbove normal upper Delta With caret equals upper U Superscript left-parenthesis k right-parenthesis Baseline normal w normal h normal e normal r normal e k equals left-parenthesis m plus 1 right-parenthesis slash 2

When m is an even number, the median difference is the average of the values that have ranks left-parenthesis m slash 2 right-parenthesis and left-parenthesis left-parenthesis m slash 2 right-parenthesis plus 1 right-parenthesis,

ModifyingAbove normal upper Delta With caret equals left-parenthesis upper U Superscript left-parenthesis k right-parenthesis Baseline plus upper U Superscript left-parenthesis k plus 1 right-parenthesis Baseline right-parenthesis slash 2 normal w normal h normal e normal r normal e k equals m slash 2

Following Hollander and Wolfe (1999), the asymptotic lower and upper confidence limits for the location shift are

left-parenthesis normal upper Delta Subscript upper L Baseline equals upper U Superscript left-parenthesis upper C Super Subscript alpha Superscript right-parenthesis Baseline comma normal upper Delta Subscript upper U Baseline equals upper U Superscript left-parenthesis m plus 1 minus upper C Super Subscript alpha Superscript right-parenthesis Baseline right-parenthesis

where upper C Subscript alpha is the largest integer less than or equal to upper C Subscript alpha Superscript asterisk, which is computed as

upper C Subscript alpha Superscript asterisk Baseline equals normal upper E 0 left-parenthesis upper S right-parenthesis minus z Subscript alpha slash 2 Baseline StartRoot normal upper V normal a normal r 0 left-parenthesis upper S right-parenthesis EndRoot

where normal upper E 0 left-parenthesis upper S right-parenthesis and normal upper V normal a normal r 0 left-parenthesis upper S right-parenthesis are the expected value and variance, respectively, of the Wilcoxon statistic S under the null hypothesis (as described in the section Simple Linear Rank Tests for Two-Sample Data), and z Subscript alpha slash 2 is the 100 left-parenthesis 1 minus alpha slash 2 right-parenthesisth percentile of the standard normal distribution. For Wilcoxon rank scores,

normal upper E 0 left-parenthesis upper S right-parenthesis equals n 1 n 2 slash 2

When there are no tied values, normal upper V normal a normal r 0 left-parenthesis upper S right-parenthesis for Wilcoxon scores equals

normal upper V normal a normal r 0 left-parenthesis upper S right-parenthesis equals n 1 n 2 left-parenthesis n 1 plus n 2 plus 1 right-parenthesis slash 12

PROC NPAR1WAY displays the midpoint of the confidence interval left-parenthesis normal upper Delta Subscript upper L Baseline comma normal upper Delta Subscript upper U Baseline right-parenthesis, which can also be used as an estimate of location shift. For more information, see Lehmann (1963). Additionally, PROC NPAR1WAY provides an estimate of the asymptotic standard error of ModifyingAbove normal upper Delta With caret based on the length of the confidence interval, which is computed as

normal s normal e left-parenthesis ModifyingAbove normal upper Delta With caret right-parenthesis equals left-parenthesis normal upper Delta Subscript upper U Baseline minus normal upper Delta Subscript upper L Baseline right-parenthesis slash 2 z Subscript alpha slash 2
Exact Confidence Limits

If you specify the HL option in the EXACT statement, PROC NPAR1WAY computes exact confidence limits for the location shift between the two samples. You can specify the level of the confidence limits in the ALPHA= option in the PROC NPAR1WAY statement. By default, ALPHA=0.05, which produces 95% confidence limits.

PROC NPAR1WAY computes exact confidence limits for the location shift as described in Randles and Wolfe (1979, p. 180). PROC NPAR1WAY first generates the exact conditional distribution of the Mann-Whitney U statistic, which is the number of pairwise differences left-parenthesis upper Y Subscript j Baseline minus upper X Subscript i Baseline right-parenthesis that are positive plus half the number of pairwise differences that are 0. The Mann-Whitney statistic is defined as

upper M equals sigma-summation Underscript j equals 1 Overscript n 1 Endscripts sigma-summation Underscript i equals 1 Overscript n 2 Endscripts phi left-parenthesis upper Y Subscript j Baseline comma upper X Subscript i Baseline right-parenthesis

where

phi left-parenthesis upper Y Subscript j Baseline comma upper X Subscript i Baseline right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column normal i normal f upper Y Subscript j Baseline greater-than upper X Subscript i Baseline 2nd Row 1st Column 1 slash 2 2nd Column normal i normal f upper Y Subscript j Baseline equals upper X Subscript i Baseline 3rd Row 1st Column 0 2nd Column normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout

From the exact conditional distribution of the Mann-Whitney statistic M, PROC NPAR1WAY chooses upper C Subscript upper L comma alpha Superscript asterisk as the smallest value such that normal upper P normal r normal o normal b left-parenthesis upper M greater-than-or-equal-to upper C Subscript upper L comma alpha Superscript asterisk Baseline right-parenthesis less-than-or-equal-to alpha slash 2. Rounding upper C Subscript upper L comma alpha Superscript asterisk up to the nearest integer upper C Subscript upper L comma alpha, the lower confidence limit is the difference left-parenthesis upper Y Subscript i Baseline minus upper X Subscript j Baseline right-parenthesis that has a rank of left-parenthesis n 1 n 2 minus upper C Subscript upper L comma alpha Baseline plus 1 right-parenthesis.

To find the upper confidence limit, PROC NPAR1WAY chooses upper C Subscript upper U comma alpha Superscript asterisk as the largest Mann-Whitney value such that normal upper P normal r normal o normal b left-parenthesis upper M less-than-or-equal-to upper C Subscript upper U comma alpha Superscript asterisk Baseline right-parenthesis less-than-or-equal-to alpha slash 2. Rounding upper C Subscript upper U comma alpha Superscript asterisk down to the nearest integer upper C Subscript upper U comma alpha, the upper confidence limit is the difference left-parenthesis upper Y Subscript i Baseline minus upper X Subscript j Baseline right-parenthesis that has a rank of left-parenthesis n 1 n 2 minus upper C Subscript upper U comma alpha Baseline right-parenthesis.

Because this is a discrete problem, the confidence coefficient is not exactly (1 – alpha) but is at least (1 – alpha); thus, these confidence limits are conservative.

Last updated: December 09, 2022