The distribution function can be obtained by integrating the kernel density estimate (Azzalini 1981). PROC KDE provides both an analytical and a seminumerical integration approach, each of which involves the closed form solution to the integral of the binned density estimator from the previous section.
For the univariate case, the distribution function is
where h is the univariate kernel bandwidth and
For the bivariate case, the distribution function is
where and
are the bivariate kernel bandwidths for variables x and y.
The analytical integration approach is simply the direct evaluation of the appropriate distribution function equation. The seminumerical integration approach is a mixture of direct evaluation of the distribution function equation and numerical integration via the extended trapezoidal rule (Press et al. 1992). This mixture depends on whether the upper integration limits fall inside or outside the binning grid. In general, there are three cases:
Integration limits precede leading grid edges.
Integration limits fall within grid.
Integration limits follow trailing grid edges.
When the integration limits lead or trail the grid edges, the seminumerical approach is identical to the analytical approach. Otherwise, the seminumerical approach splits the overall integral into integrals from minus infinity to the lower grid edges, and an integral from the leading grid edges to the upper integration limits. For the univariate case, this split becomes
The term is simply the analytical distribution function up to the leading grid edge. The second term is evaluated numerically:
The bivariate case is similar to the univariate case, although there are multiple analytical terms to evaluate due to the nature of the integral:
As with the univariate case, the fourth term is evaluated numerically: