The KDE Procedure

Kernel Distribution Estimates

The distribution function can be obtained by integrating the kernel density estimate (Azzalini 1981). PROC KDE provides both an analytical and a seminumerical integration approach, each of which involves the closed form solution to the integral of the binned density estimator from the previous section.

For the univariate case, the distribution function is

StartLayout 1st Row 1st Column ModifyingAbove upper F With caret left-parenthesis x right-parenthesis 2nd Column equals 3rd Column integral Subscript negative normal infinity Superscript x Baseline ModifyingAbove f With caret left-parenthesis u right-parenthesis d u 2nd Row 1st Column Blank 2nd Column equals 3rd Column StartFraction 1 Over upper N EndFraction sigma-summation Underscript i equals 1 Overscript g Endscripts c Subscript i Baseline normal upper Phi Subscript h Baseline left-parenthesis x minus x Subscript i Baseline right-parenthesis EndLayout

where h is the univariate kernel bandwidth and

normal upper Phi Subscript h Baseline left-parenthesis x right-parenthesis equals StartFraction 1 Over StartRoot 2 pi EndRoot h EndFraction integral Subscript negative normal infinity Superscript x Baseline exp left-parenthesis minus StartFraction u squared Over 2 h squared EndFraction right-parenthesis d u

For the bivariate case, the distribution function is

StartLayout 1st Row 1st Column ModifyingAbove upper F With caret left-parenthesis x comma y right-parenthesis 2nd Column equals 3rd Column integral Subscript negative normal infinity Superscript x Baseline integral Subscript negative normal infinity Superscript y Baseline ModifyingAbove f With caret left-parenthesis u comma v right-parenthesis d u d v 2nd Row 1st Column Blank 2nd Column equals 3rd Column StartFraction 1 Over upper N EndFraction sigma-summation Underscript j equals 1 Overscript g Subscript upper Y Endscripts normal upper Phi Subscript h Sub Subscript upper Y Baseline left-parenthesis y minus y Subscript j Baseline right-parenthesis sigma-summation Underscript i equals 1 Overscript g Subscript upper X Endscripts c Subscript i comma j Baseline normal upper Phi Subscript h Sub Subscript upper X Baseline left-parenthesis x minus x Subscript i Baseline right-parenthesis EndLayout

where h Subscript upper X and h Subscript upper Y are the bivariate kernel bandwidths for variables x and y.

The analytical integration approach is simply the direct evaluation of the appropriate distribution function equation. The seminumerical integration approach is a mixture of direct evaluation of the distribution function equation and numerical integration via the extended trapezoidal rule (Press et al. 1992). This mixture depends on whether the upper integration limits fall inside or outside the binning grid. In general, there are three cases:

  • Integration limits precede leading grid edges.

  • Integration limits fall within grid.

  • Integration limits follow trailing grid edges.

When the integration limits lead or trail the grid edges, the seminumerical approach is identical to the analytical approach. Otherwise, the seminumerical approach splits the overall integral into integrals from minus infinity to the lower grid edges, and an integral from the leading grid edges to the upper integration limits. For the univariate case, this split becomes

StartLayout 1st Row 1st Column ModifyingAbove upper F With caret left-parenthesis x right-parenthesis 2nd Column equals 3rd Column integral Subscript negative normal infinity Superscript x 1 Baseline ModifyingAbove f With caret left-parenthesis u right-parenthesis d u plus integral Subscript x 1 Superscript x Baseline ModifyingAbove f With caret left-parenthesis u right-parenthesis d u 2nd Row 1st Column Blank 2nd Column equals 3rd Column ModifyingAbove upper F With caret left-parenthesis x 1 right-parenthesis plus integral Subscript x 1 Superscript x Baseline ModifyingAbove f With caret left-parenthesis u right-parenthesis d u EndLayout

The term ModifyingAbove upper F With caret left-parenthesis x 1 right-parenthesis is simply the analytical distribution function up to the leading grid edge. The second term is evaluated numerically:

  • If x coincides with a grid element x Subscript k, the overall integral is

    ModifyingAbove upper F With caret left-parenthesis x right-parenthesis almost-equals ModifyingAbove upper F With caret left-parenthesis x 1 right-parenthesis plus ModifyingAbove upper F With tilde left-parenthesis x Subscript k Baseline right-parenthesis

    where

    ModifyingAbove upper F With tilde left-parenthesis x Subscript k Baseline right-parenthesis equals left-bracket sigma-summation Underscript m equals 1 Overscript k Endscripts ModifyingAbove f With caret left-parenthesis x Subscript m Baseline right-parenthesis minus one-half left-parenthesis ModifyingAbove f With caret left-parenthesis x 1 right-parenthesis plus ModifyingAbove f With caret left-parenthesis x Subscript k Baseline right-parenthesis right-parenthesis right-bracket delta
  • If x does not coincide with a grid element, then the numerical integral is approximated by linear interpolation that uses the nearest grid elements p and p plus 1:

    ModifyingAbove upper F With tilde left-parenthesis x right-parenthesis equals ModifyingAbove upper F With tilde left-parenthesis x Subscript p Baseline right-parenthesis plus StartFraction left-parenthesis ModifyingAbove upper F With tilde left-parenthesis x Subscript p plus 1 Baseline right-parenthesis minus ModifyingAbove upper F With tilde left-parenthesis x Subscript p Baseline right-parenthesis right-parenthesis Over x Subscript p plus 1 Baseline minus x Subscript p Baseline EndFraction left-parenthesis x minus x Subscript p Baseline right-parenthesis period

The bivariate case is similar to the univariate case, although there are multiple analytical terms to evaluate due to the nature of the integral:

ModifyingAbove upper F With caret left-parenthesis x comma y right-parenthesis equals ModifyingAbove upper F With caret left-parenthesis x comma y 1 right-parenthesis plus ModifyingAbove upper F With caret left-parenthesis x 1 comma y right-parenthesis minus ModifyingAbove upper F With caret left-parenthesis x 1 comma y 1 right-parenthesis plus integral Subscript x 1 Superscript x Baseline integral Subscript y 1 Superscript y Baseline ModifyingAbove f With caret left-parenthesis u comma v right-parenthesis d u d v

As with the univariate case, the fourth term is evaluated numerically:

  • If left-parenthesis x comma y right-parenthesis coincides with a grid element, then the overall integral is

    ModifyingAbove upper F With caret left-parenthesis x Subscript k Baseline comma y Subscript script l Baseline right-parenthesis almost-equals ModifyingAbove upper F With caret left-parenthesis x comma y 1 right-parenthesis plus ModifyingAbove upper F With caret left-parenthesis x 1 comma y right-parenthesis minus ModifyingAbove upper F With caret left-parenthesis x 1 comma y 1 right-parenthesis plus ModifyingAbove upper F With tilde left-parenthesis x Subscript k Baseline comma y Subscript script l Baseline right-parenthesis

    where ModifyingAbove upper F With tilde left-parenthesis x Subscript k Baseline comma y Subscript script l Baseline right-parenthesis is recursively computed via

    StartLayout 1st Row 1st Column ModifyingAbove upper F With tilde left-parenthesis x Subscript k Baseline right-parenthesis 2nd Column equals 3rd Column left-bracket sigma-summation Underscript n equals 1 Overscript script l Endscripts ModifyingAbove f With caret left-parenthesis x Subscript k Baseline comma y Subscript n Baseline right-parenthesis minus one-half left-parenthesis ModifyingAbove f With caret left-parenthesis x Subscript k Baseline comma y 1 right-parenthesis plus ModifyingAbove f With caret left-parenthesis x Subscript k Baseline comma y Subscript script l Baseline right-parenthesis right-parenthesis right-bracket delta Subscript upper Y 2nd Row 1st Column ModifyingAbove upper F With tilde left-parenthesis x Subscript k Baseline comma y Subscript script l Baseline right-parenthesis 2nd Column almost-equals 3rd Column left-bracket sigma-summation Underscript m equals 1 Overscript k Endscripts ModifyingAbove upper F With caret left-parenthesis x Subscript m Baseline right-parenthesis minus one-half left-parenthesis ModifyingAbove upper F With caret left-parenthesis x 1 right-parenthesis plus ModifyingAbove upper F With caret left-parenthesis x Subscript k Baseline right-parenthesis right-parenthesis right-bracket delta Subscript upper X EndLayout
  • If left-parenthesis x comma y right-parenthesis does not coincide with a grid element, then the numerical integral is approximated by bilinear interpolation that uses the nearest grid elements p comma p plus 1 comma q comma and q plus 1:

    StartLayout 1st Row 1st Column ModifyingAbove upper F With tilde left-parenthesis y Subscript q Baseline right-parenthesis 2nd Column equals 3rd Column ModifyingAbove upper F With tilde left-parenthesis x Subscript p Baseline comma y Subscript q Baseline right-parenthesis plus StartFraction left-parenthesis ModifyingAbove upper F With tilde left-parenthesis x Subscript p plus 1 Baseline comma y Subscript q Baseline right-parenthesis minus ModifyingAbove upper F With tilde left-parenthesis x Subscript p Baseline comma y Subscript q Baseline right-parenthesis right-parenthesis Over x Subscript p plus 1 Baseline minus x Subscript p Baseline EndFraction left-parenthesis x minus x Subscript p Baseline right-parenthesis 2nd Row 1st Column ModifyingAbove upper F With tilde left-parenthesis y Subscript q plus 1 Baseline right-parenthesis 2nd Column equals 3rd Column ModifyingAbove upper F With tilde left-parenthesis x Subscript p Baseline comma y Subscript q plus 1 Baseline right-parenthesis plus StartFraction left-parenthesis ModifyingAbove upper F With tilde left-parenthesis x Subscript p plus 1 Baseline comma y Subscript q plus 1 Baseline right-parenthesis minus ModifyingAbove upper F With tilde left-parenthesis x Subscript p Baseline comma y Subscript q plus 1 Baseline right-parenthesis right-parenthesis Over x Subscript p plus 1 Baseline minus x Subscript p Baseline EndFraction left-parenthesis x minus x Subscript p Baseline right-parenthesis 3rd Row 1st Column ModifyingAbove upper F With tilde left-parenthesis x comma y right-parenthesis 2nd Column equals 3rd Column ModifyingAbove upper F With tilde left-parenthesis y Subscript q Baseline right-parenthesis plus StartFraction left-parenthesis ModifyingAbove upper F With tilde left-parenthesis y Subscript q plus 1 Baseline right-parenthesis minus ModifyingAbove upper F With tilde left-parenthesis y Subscript q Baseline right-parenthesis right-parenthesis Over y Subscript q plus 1 Baseline minus y Subscript q Baseline EndFraction left-parenthesis y minus y Subscript q Baseline right-parenthesis EndLayout
Last updated: December 09, 2022