The BOXPLOT Procedure

Percentile Definitions

You can use the PCTLDEF= option to specify one of five definitions for computing quantile statistics (percentiles). Suppose that n is the number of nonmissing values for a variable and that x 1 comma x 2 comma ellipsis comma x Subscript n Baseline represent the ordered values of the analysis variable. For the tth percentile, set p equals t slash 100 and let

StartLayout 1st Row 1st Column n p 2nd Column equals 3rd Column j plus g 4th Column when PCTLDEF equals 1 comma 2 comma 3 comma or 5 2nd Row 1st Column left-parenthesis n plus 1 right-parenthesis p 2nd Column equals 3rd Column j plus g 4th Column when PCTLDEF equals 4 EndLayout

where j is the integer part of the quantity and g is the fractional part of the quantity.

The tth percentile (call it y) can be defined as follows:

PCTLDEF=1

weighted average at x Subscript n p

y equals left-parenthesis 1 minus g right-parenthesis x Subscript j Baseline plus g x Subscript j plus 1

where x 0 is taken to be x 1.

PCTLDEF=2

observation numbered closest to n p

y equals x Subscript i

where i is the integer part of n p plus 1 slash 2

PCTLDEF=3

empirical distribution function

y equals x Subscript j Baseline if g equals 0
y equals x Subscript j plus 1 Baseline if g greater-than 0
PCTLDEF=4

weighted average aimed at x Subscript p left-parenthesis n plus 1 right-parenthesis

y equals left-parenthesis 1 minus g right-parenthesis x Subscript j Baseline plus g x Subscript j plus 1

where x Subscript n plus 1 is taken to be x Subscript n.

PCTLDEF=5

empirical distribution function with averaging

y equals left-parenthesis x Subscript j Baseline plus x Subscript j plus 1 Baseline right-parenthesis slash 2 if g equals 0
y equals x Subscript j plus 1 Baseline if g greater-than 0
Last updated: December 09, 2022