The NPAR1WAY Procedure

Empirical Distribution Function Tests

If you specify the EDF option, PROC NPAR1WAY computes tests based on the empirical distribution function. These include the Kolmogorov-Smirnov and Cramér–von Mises tests, and also the Kuiper test for two-sample data. This section gives formulas for these test statistics. For further information about the formulas and the interpretation of EDF statistics, see Hollander and Wolfe (1999) and Gibbons and Chakraborti (2010). For information about the k-sample analogs of the Kolmogorov-Smirnov and Cramér–von Mises statistics, see Kiefer (1959).

The empirical distribution function (EDF) of a sample , , is defined as

upper F left-parenthesis x right-parenthesis equals StartFraction 1 Over n EndFraction left-parenthesis number of x Subscript j Baseline less-than-or-equal-to x right-parenthesis equals StartFraction 1 Over n EndFraction sigma-summation Underscript j equals 1 Overscript n Endscripts upper I left-parenthesis x Subscript j Baseline less-than-or-equal-to x right-parenthesis

where is an indicator function. PROC NPAR1WAY uses the subsample of values within the ith class level to generate an EDF for the class, . The EDF for the overall sample, pooled over classes, can also be expressed as

upper F left-parenthesis x right-parenthesis equals StartFraction 1 Over n EndFraction sigma-summation Underscript i Endscripts n Subscript i Baseline upper F Subscript i Baseline left-parenthesis x right-parenthesis

where is the number of observations in the ith class level, and n is the total number of observations.

Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov statistic measures the maximum deviation of the EDF within the classes from the pooled EDF. PROC NPAR1WAY computes the Kolmogorov-Smirnov statistic as

normal upper K normal upper S equals max Underscript j Endscripts StartRoot StartFraction 1 Over n EndFraction sigma-summation Underscript i Endscripts n Subscript i Baseline left-parenthesis upper F Subscript i Baseline left-parenthesis x Subscript j Baseline right-parenthesis minus upper F left-parenthesis x Subscript j Baseline right-parenthesis right-parenthesis squared EndRoot normal w normal h normal e normal r normal e j equals 1 comma 2 comma ellipsis comma n

The asymptotic Kolmogorov-Smirnov statistic is computed as

normal upper K normal upper S Subscript a Baseline equals normal upper K normal upper S times StartRoot n EndRoot

For each class level i and overall, PROC NPAR1WAY displays the value of at the maximum deviation from F and the value at the maximum deviation from F. PROC NPAR1WAY also gives the observation where the maximum deviation occurs.

If there are only two class levels, PROC NPAR1WAY computes the two-sample Kolmogorov-Smirnov test statistic D as

upper D equals max Underscript j Endscripts StartAbsoluteValue upper F 1 left-parenthesis x Subscript j Baseline right-parenthesis minus upper F 2 left-parenthesis x Subscript j Baseline right-parenthesis EndAbsoluteValue normal w normal h normal e normal r normal e j equals 1 comma 2 comma ellipsis comma n

The p-value for this test is the probability that D is greater than the observed value d under the null hypothesis of no difference between class levels (samples). PROC NPAR1WAY computes the asymptotic p-value for D by using the approximation

normal upper P normal r normal o normal b left-parenthesis upper D greater-than d right-parenthesis equals 2 sigma-summation Underscript i equals 1 Overscript normal infinity Endscripts left-parenthesis negative 1 right-parenthesis Superscript left-parenthesis i minus 1 right-parenthesis Baseline e Superscript left-parenthesis minus 2 i squared z squared right-parenthesis

where

z equals d StartRoot n 1 n 2 slash n EndRoot

For more information, see Hodges (1957).

If you specify the D option, or if you request exact Kolmogorov-Smirnov p-values by specifying the KS option in the EXACT statement, PROC NPAR1WAY also computes the one-sided Kolmogorov-Smirnov statistics D+ and D– for two-sample data as

upper D plus equals max Underscript j Endscripts left-parenthesis upper F 1 left-parenthesis x Subscript j Baseline right-parenthesis minus upper F 2 left-parenthesis x Subscript j Baseline right-parenthesis right-parenthesis normal w normal h normal e normal r normal e j equals 1 comma 2 comma ellipsis comma n

upper D minus equals max Underscript j Endscripts left-parenthesis upper F 2 left-parenthesis x Subscript j Baseline right-parenthesis minus upper F 1 left-parenthesis x Subscript j Baseline right-parenthesis right-parenthesis normal w normal h normal e normal r normal e j equals 1 comma 2 comma ellipsis comma n

The asymptotic probability that D+ is greater than the observed value , under the null hypothesis of no difference between the two class levels, is computed as

normal upper P normal r normal o normal b left-parenthesis upper D plus greater-than d Superscript plus Baseline right-parenthesis equals e Superscript minus 2 z squared Baseline normal w normal h normal e normal r normal e z equals d Superscript plus Baseline StartRoot n 1 n 2 slash n EndRoot

Similarly, the asymptotic probability that D– is greater than the observed value is computed as

normal upper P normal r normal o normal b left-parenthesis upper D minus greater-than d Superscript minus Baseline right-parenthesis equals e Superscript minus 2 z squared Baseline normal w normal h normal e normal r normal e z equals d Superscript minus Baseline StartRoot n 1 n 2 slash n EndRoot

To request exact p-values for the Kolmogorov-Smirnov statistics, you can specify the KS option in the EXACT statement. For more information, see the section Exact Tests.

Cramér–von Mises Test

The Cramér–von Mises statistic is defined as

normal upper C normal upper M equals StartFraction 1 Over n squared EndFraction sigma-summation Underscript i Endscripts left-parenthesis n Subscript i Baseline sigma-summation Underscript j equals 1 Overscript p Endscripts t Subscript j Baseline left-parenthesis upper F Subscript i Baseline left-parenthesis x Subscript j Baseline right-parenthesis minus upper F left-parenthesis x Subscript j Baseline right-parenthesis right-parenthesis squared right-parenthesis

where is the number of ties at the jth distinct value and p is the number of distinct values. The asymptotic value is computed as

normal upper C normal upper M Subscript a Baseline equals normal upper C normal upper M times n

PROC NPAR1WAY displays the contribution of each class level to the sum CM.

Kuiper Test

For data with two class levels, PROC NPAR1WAY computes the Kuiper statistic, its scaled value for the asymptotic distribution, and the asymptotic p-value. The Kuiper statistic is computed as

upper K equals max Underscript j Endscripts left-parenthesis upper F 1 left-parenthesis x Subscript j Baseline right-parenthesis minus upper F 2 left-parenthesis x Subscript j Baseline right-parenthesis right-parenthesis minus min Underscript j Endscripts left-parenthesis upper F 1 left-parenthesis x Subscript j Baseline right-parenthesis minus upper F 2 left-parenthesis x Subscript j Baseline right-parenthesis right-parenthesis normal w normal h normal e normal r normal e j equals 1 comma 2 comma ellipsis comma n

The asymptotic value is

upper K Subscript a Baseline equals upper K StartRoot n 1 n 2 slash n EndRoot

PROC NPAR1WAY displays the value of for each class level.

The p-value for the Kuiper test is the probability of observing a larger value of under the null hypothesis of no difference between the two classes. PROC NPAR1WAY computes this p-value according to Owen (1962, p. 441).

Last updated: December 09, 2022