The KDE Procedure

Binning

Binning, or assigning data to discrete categories, is an effective and fast method for large data sets (Fan and Marron 1994). When the sample size n is large, direct evaluation of the kernel estimate at any point would involve n kernel evaluations, as shown in the preceding formulas. To evaluate the estimate at each point of a grid of size g would thus require kernel evaluations. When you use g = 401 in the univariate case or in the bivariate case and , the amount of computation can be prohibitively large. With binning, however, the computational order is reduced to g, resulting in a much quicker algorithm that is nearly as accurate as direct evaluation.

To bin a set of weighted univariate data to a grid , simply assign each sample , together with its weight , to the nearest grid point (also called the bin center). When binning is completed, each grid point has an associated number , which is the sum total of all the weights that correspond to sample points that have been assigned to . These s are known as the bin counts.

This procedure replaces the data , with the smaller set , and the estimation is carried out with these new data. This is so-called simple binning, versus the finer linear binning described in Wand (1994). PROC KDE uses simple binning for the sake of faster and easier implementation. Also, it is assumed that the bin centers are equally spaced and in increasing order. In addition, assume for notational convenience that and, therefore, .