The KDE Procedure

BIVAR Statement

  • BIVAR v1 <(v-options)> v2 <(v-options)> …<vN <(v-options)>> </ options>;

  • BIVAR (v1 v2 ) <(v3 v4 )(vN–1 vN )> </ options>;

The BIVAR statement computes bivariate kernel density estimates for the specified variables. The v-options optionally specified in parentheses after a variable name apply only to that variable, and they override corresponding global options that are specified following a slash (/).

You must specify at least two variables, v1 and v2. If you specify more than two variables, PROC KDE computes a bivariate kernel density estimate for each distinct pair of variables in the list. For example, if you specify the following statement, then a bivariate kernel density estimate is computed for each of the variable pairs (x, y), (x, z), and (y, z):

bivar x y z;

Alternatively, you can specify an explicit list of variable pairs, with each pair enclosed in parentheses. This requests a bivariate kernel density for each pair of variables. For example, if you specify the following statement, then bivariate kernel density estimates are computed for (x, y) and (y, z).

bivar (x y) (y z);

Table 1 summarizes the options available in the BIVAR statement.

Table 1: BIVAR Statement Options

Option Description
BIVSTATS Produces a table for each density estimate
BW= Specifies the bandwidth
BWM= Specifies the bandwidth multiplier
CDF Produces the distribution function
GRIDL= Specifies the lower grid limit
GRIDU= Specifies the upper grid limit
LEVELS Produces a table of levels for contours of the bivariate density
NGRID= Specifies the number of grid points associated with each variable
NOPRINT Suppresses output tables
OUT= Specifies the name of the output data set
PERCENTILES Produces a table of percentiles
PLOTS= Requests one or more plots
TRUNCATE Restricts the lower and upper grid limits to the minimum and maximum observed values, respectively, for each variable
UNISTATS Produces, for each density estimate, a table that contains standard univariate statistics and the bandwidths


You can specify the following options in the BIVAR statement. Some options can be used as v-options, as indicated in the description of the option.

BIVSTATS

produces, for each density estimate, a table that contains the covariance and correlation between the two variables.

BW=number

specifies the bandwidth to apply to each variable in each kernel density estimate. Larger bandwidths produce a smoother estimate, whereas smaller bandwidths produce a rougher estimate. To specify different bandwidths for different variables, specify BW=number as a v-option. By default, the bandwidth is set automatically by the simple normal reference method (see the section Bandwidth Selection).

BWM=number

specifies the bandwidth multiplier to apply to the corresponding bandwidth for each variable. Values of number greater than 1 increase the effective bandwidth and produce a smoother estimate. Values less than 1 decrease the effective bandwidth and produce a rougher estimate. To specify different bandwidth multipliers for different variables, specify BWM=number as a v-option. By default, BWM=1.

CDF

computes the distribution function in addition to the density function for each pair of variables. The distribution function is obtained by a seminumerical technique as described in the section Kernel Distribution Estimates.

GRIDL=number

specifies the lower grid limit to apply to each variable in each kernel density estimate. To specify different lower grid limits for different variables, specify GRIDL=number as a v-option. The default value for a particular variable is a function of both the kernel bandwidth and the minimum observed value for that variable.

GRIDU=number

specifies the upper grid limit to apply to each variable in each kernel density estimate. To specify different upper grid limits for different variables, specify GRIDU=number as a v-option. The default value for a particular variable is a function of both the kernel bandwidth and the maximum observed value for that variable.

LEVELS
LEVELS=(numlist)

computes a table of levels (called "Levels") for contours of the bivariate density. The number of contours is equal to the number of values in numlist, where each value in numlist specifies a percentage to be used in calculating the density volume that is enclosed by the contour. The contours are defined such that the density has a constant level along each contour, and the volume enclosed by each contour corresponds to the total density volume minus the specified percentage of the total volume. In other words, the contours correspond to slices or levels of the density surface that are taken along the density axis. The "Levels" table also provides the minimum and maximum values for each contour along the directions of the two data variables. By default, LEVELS=(1, 5, 10, 50, 90, 95, 99, 100).

NGRID=number
NG=number

specifies the number of grid points to be associated with each variable in each kernel density estimate. To specify different numbers of grid points for different variables, specify NGRID=number as a v-option. By default, NGRID=60.

NOPRINT

suppresses output tables. You can use this option when you want to produce only graphical output.

OUT=SAS-data-set

names the output data set in which to save kernel density estimates. This output data set contains the following variables:

  • var1, whose value is the name of the first variable in a bivariate kernel density estimate

  • var2, whose value is the name of the second variable in a bivariate kernel density estimate

  • value1, whose value corresponds to grid coordinates for the first variable

  • value2, whose value corresponds to grid coordinates for the second variable

  • density, whose values are equal to kernel density estimate at the associated grid point

  • count, whose values represent the number of original observations contained in the bin that corresponds to a grid point

  • distribution, whose values are equal to the distribution estimate at the associated grid point (this variable is included only when the CDF global option is specified)

PERCENTILES
PERCENTILES=numlist

produces a table of percentiles for each BIVAR variable. You can specify a list of percentiles to be computed in numlist. The default percentiles are 0.5, 1, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, 99, and 99.5.

PLOTS=(plot-request<(options)> <…plot-request <(options)>>)

specifies which plots of the bivariate data and kernel density estimate to produce. When you specify only one plot-request, you can omit the parentheses around the plot-request.

ODS Graphics must be enabled before plots can be requested. For example:

ods graphics on;

proc kde data=octane;
   bivar Rater Customer / plots=all;
run;

ods graphics off;

For more information about enabling and disabling ODS Graphics, see the section Enabling and Disabling ODS Graphics in Chapter 24, Statistical Graphics Using ODS.

By default, if ODS Graphics is enabled and you do not specify the PLOTS= option, then the BIVAR statement creates a contour plot. If you specify the PLOTS= option, only the requested plots are created.

You can specify the following plot-requests:

ALL

produces all bivariate plots.

CONTOUR

produces a contour plot of the bivariate density estimate.

CONTOURSCATTER

produces a contour plot of the bivariate density estimate overlaid with a scatter plot of the data.

HISTOGRAM <(view-options)>

produces a bivariate histogram of the data. You can specify one or both of the following view-options within parentheses:

ROTATE=angle

rotates the histogram angle degrees, where –180 < angle < 180. By default, ROTATE=54.

TILT=angle

tilts the histogram angle degrees, where –180 < angle < 180. By default, TILT=20.

HISTSURFACE <(view-options)>

produces a bivariate histogram of the data overlaid with a surface plot of the bivariate kernel density estimate. You can specify one or both of the following view-options within parentheses:

ROTATE=angle

rotates the histogram and kernel density surface angle degrees, where –180 < angle < 180. By default, ROTATE=54.

TILT=angle

tilts the histogram and kernel density surface angle degrees, where –180 < angle < 180. By default, TILT=20.

NONE

suppresses all plots, including the contour plot that is produced by default when ODS Graphics is enabled and the PLOTS= option is not specified.

SCATTER

produces a scatter plot of the data.

SURFACE <(view-options)>

produces a surface plot of the bivariate kernel density estimate. You can specify one or both of the following view-options within parentheses:

ROTATE=angle

rotates the kernel density surface angle degrees, where –180 < angle < 180. By default, ROTATE=54.

TILT=angle

tilts the kernel density surface angle degrees, where –180 < angle < 180. By default, TILT=20.

TRUNCATE

sets the lower grid limit for each variable to the minimum observed for that variable, and sets the upper grid limit for each variable to the maximum observed value for that variable.

Note: The GRIDL and GRIDU options take precedence over the TRUNCATE option. If one or both are specified, the corresponding lower and upper grid limits are set accordingly.

UNISTATS

produces for each density estimate a table that contains standard univariate statistics for each of the variable pairs and the bandwidths that are used to compute the kernel density estimate. The statistics indexed in the table are the mean, variance, standard deviation, range, and interquartile range.

Last updated: December 09, 2022