The BOXPLOT Procedure

Output Data Sets

OUTBOX= Data Set

The OUTBOX= data set saves group summary statistics and outlier values. The following variables can be saved:

  • the group variable

  • the variable _VAR_, containing the analysis variable name

  • the variable _TYPE_, identifying features of box-and-whiskers plots

  • the variable _VALUE_, containing values of box-and-whiskers plot features

  • the variable _ID_, containing labels for outliers

  • the variable _HTML_, containing URLs associated with plot features

_ID_ is included in the OUTBOX= data set only if the keyword SCHEMATICID or SCHEMATICIDFAR is specified with the BOXSTYLE= option. _HTML_ is present only if one or more of the HTML=, OUTHIGHHTML=, and OUTLOWHTML= options are specified.

Each observation in an OUTBOX= data set records the value of a single feature of one group’s box-and-whiskers plot, such as its mean. The _TYPE_ variable identifies the feature whose value is recorded in _VALUE_. Table 8 lists valid _TYPE_ variable values.

Table 8: Valid _TYPE_ Values in an OUTBOX= Data Set

_TYPE_ Description
N Group size
MIN Minimum group value
Q1 Group first quartile
MEDIAN Group median
MEAN Group mean
Q3 Group third quartile
MAX Group maximum value
STDDEV Group standard deviation
LOW Low outlier value
HIGH High outlier value
LOWHISKR Low whisker value, if different from MIN
HIWHISKR High whisker value, if different from MAX
FARLOW Low far outlier value
FARHIGH High far outlier value


Additionally, the following variables, if specified, are included:

  • block variables

  • symbol variable

  • BY variables

  • ID variables

OUTHISTORY= Data Set

The OUTHISTORY= data set saves group summary statistics. The following variables are saved:

  • the group variable

  • group minimum variables named by analysis-variable suffixed with L

  • group first-quartile variables named by analysis-variable suffixed with 1

  • group mean variables named by analysis-variable suffixed with X

  • group median variables named by analysis-variable suffixed with M

  • group third-quartile variables named by analysis-variable suffixed with 3

  • group maximum variables named by analysis-variable suffixed with H

  • group standard deviation variables named by analysis-variable suffixed with S

  • group size variables named by analysis-variable suffixed with N

If an analysis variable name has the maximum length of 32 characters, PROC BOXPLOT forms summary statistic names from its first 16 characters, its last 15 characters, and the appropriate suffix.

Group summary variables are created for each analysis variable specified in the PLOT statement. For example, consider the following statements:

proc boxplot data=Steel;
   plot (Width Diameter)*Lot / outhistory=Summary;
run;

The data set Summary contains variables named Lot, WidthL, Width1, WidthM, WidthX, Width3, WidthH, WidthS, WidthN, DiameterL, Diameter1, DiameterM, DiameterX, Diameter3, DiameterH, DiameterS, and DiameterN.

Additionally, the following variables, if specified, are included:

  • BY variables

  • block variables

  • symbol variable

  • ID variables

Note that an OUTHISTORY= data set does not contain outlier values, and therefore cannot be used, in general, to save a schematic box plot. You can use an OUTBOX= data set to save a schematic box plot summary.

Last updated: December 09, 2022