The KDE Procedure

ODS Graphics

Statistical procedures use ODS Graphics to create graphs as part of their output. ODS Graphics is described in detail in Chapter 24, Statistical Graphics Using ODS.

Before you create graphs, ODS Graphics must be enabled (for example, by specifying the ODS GRAPHICS ON statement). For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 24, Statistical Graphics Using ODS.

The overall appearance of graphs is controlled by ODS styles. Styles and other aspects of using ODS Graphics are discussed in the section A Primer on ODS Statistical Graphics in Chapter 24, Statistical Graphics Using ODS.

ODS Graph Names

PROC KDE assigns a name to each graph it creates using the Output Delivery System (ODS). You can use these names to reference the graphs when using ODS. The names are listed in Table 4.

Table 4: Graphs Produced by PROC KDE

ODS Graph Name Plot Description Statement PLOTS= Option
BivariateHistogram Bivariate histogram of data BIVAR HISTOGRAM
ContourPlot Contour plot of bivariate kernel density estimate BIVAR CONTOUR
ContourScatterPlot Contour plot of bivariate kernel density estimate overlaid with scatter plot BIVAR CONTOURSCATTER
DensityPlot Univariate kernel density estimate curve UNIVAR DENSITY
DensityOverlayPlot Overlaid univariate kernel density estimate curves UNIVAR DENSITYOVERLAY
HistogramDensity Univariate histogram overlaid with kernel density estimate curve UNIVAR HISTDENSITY
Histogram Univariate histogram of data UNIVAR HISTOGRAM
HistogramSurface Bivariate histogram overlaid with surface plot of bivariate kernel density estimate BIVAR HISTSURFACE
ScatterPlot Scatter plot of data BIVAR SCATTER
SurfacePlot Surface plot of bivariate kernel density estimate BIVAR SURFACE


Bivariate Plots

You can specify the PLOTS= option in the BIVAR statement to request graphical displays of bivariate kernel density estimates.

By default, if ODS Graphics is enabled and you do not specify the PLOTS= option, then the BIVAR statement creates a contour plot. If you specify the PLOTS= option, only the requested plots are created.

Univariate Plots

You can specify the PLOTS= option in the UNIVAR statement to request graphical displays of univariate kernel density estimates.

By default, if ODS Graphics is enabled and you do not specify the PLOTS= option, then the UNIVAR statement creates a histogram overlaid with a kernel density estimate. If you specify the PLOTS= option, only the requested plots are created.

Binning of Bivariate Histogram

Let left-parenthesis upper X Subscript i Baseline comma upper Y Subscript i Baseline right-parenthesis comma i equals 1 comma 2 comma ellipsis comma n, be a sample of size n drawn from a bivariate distribution. For the marginal distribution of upper X Subscript i Baseline comma i equals 1 comma 2 comma ellipsis comma n, the number of bins (normal upper N normal b normal i normal n normal s Subscript upper X) in the bivariate histogram is calculated according to the formula

normal upper N normal b normal i normal n normal s Subscript upper X Baseline equals normal c normal e normal i normal l left-parenthesis normal r normal a normal n normal g normal e Subscript upper X Baseline slash normal w normal i normal d normal t normal h Subscript upper X Baseline right-parenthesis

where normal c normal e normal i normal l left-parenthesis x right-parenthesis denotes the smallest integer greater than or equal to x,

normal r normal a normal n normal g normal e Subscript upper X Baseline equals max Underscript 1 less-than-or-equal-to i less-than-or-equal-to n Endscripts left-parenthesis upper X Subscript i Baseline right-parenthesis minus min Underscript 1 less-than-or-equal-to i less-than-or-equal-to n Endscripts left-parenthesis upper X Subscript i Baseline right-parenthesis

and the optimal bin width is obtained, following Scott (1992, p. 84), as

normal w normal i normal d normal t normal h Subscript upper X Baseline equals 3.504 ModifyingAbove sigma With caret Subscript upper X Baseline left-parenthesis 1 minus ModifyingAbove rho With caret squared right-parenthesis Superscript 3 slash 8 Baseline n Superscript negative 1 slash 4

Here, ModifyingAbove sigma With caret Subscript upper X and ModifyingAbove rho With caret are the sample variance and the sample correlation coefficient, respectively. When you specify a WEIGHT variable, PROC KDE uses weighted versions of ModifyingAbove sigma With caret Subscript upper X and ModifyingAbove rho With caret in the preceding expressions.

Similar formulas are used to compute the number of bins for the marginal distribution of upper Y Subscript i Baseline comma i equals 1 comma 2 comma ellipsis comma n. Further details can be found in Scott (1992).

Notice that if StartAbsoluteValue ModifyingAbove rho With caret EndAbsoluteValue greater-than 0.99, then normal upper N normal b normal i normal n normal s Subscript upper X is calculated as in the univariate case (see Terrell and Scott 1985). In this case normal upper N normal b normal i normal n normal s Subscript upper Y Baseline equals normal upper N normal b normal i normal n normal s Subscript upper X.

Last updated: December 09, 2022