The KDE Procedure

SCORE Statement

  • SCORE DATA=SAS-data-set OUT=SAS-data-set </ option>;

The SCORE statement produces kernel density estimates at arbitrary locations for an associated UNIVAR or BIVAR statement.

You must specify the following arguments:

DATA=SAS-data-set

specifies the input SAS data set.

For a univariate density, the input data set contains the variable v1, where v1 is a variable that appears in a UNIVAR statement. The values of v1 indicate arbitrary points for which a density estimate is requested.

For a bivariate density, the input data set contains variables v1 and v2, where v1 and v2 are variables that appear in a BIVAR statement. Pairs (v1, v2) indicate arbitrary points for which a density estimate is requested.

When you specify a BY statement, the DATA= data set must not contain any of the BY variables. The entire data set is scored for each BY group.

OUT=SAS-data-set

names the output SAS data set to be produced by the SCORE statement.

For a univariate density, the output data set contains the following variables:

  • var, whose value is the name of the variable in the DATA= data set

  • value, whose values are taken from the DATA= data set

  • density, whose values are equal to the kernel density estimate

  • distribution, whose values are equal to the distribution estimate (this variable is included only when the CDF global option is specified in the corresponding UNIVAR statement)

For a bivariate density, the output data set has the following variables:

  • variable1, whose value is the name of the first variable in the DATA= data set

  • value1, whose values are taken from the DATA= data set

  • variable2, whose value is the name of the second variable in the DATA= data set

  • value2, whose values are taken from the DATA= data set

  • density, whose values are equal to the kernel density estimate

  • distribution, whose values are equal to the distribution estimate (this variable is included only when the CDF global option is specified in the corresponding BIVAR statement)

You can provide the following option in the SCORE statement.

METHOD=INTERP | EXACT

requests a particular scoring method. You can specify the following values:

EXACT

corresponds to the evaluation of the appropriate binned density equation from the section Binning and the analytical distribution equation from the section Kernel Distribution Estimates.

INTERP

corresponds to linear interpolation of the density estimate and seminumerical distribution estimation. The seminumerical distribution technique is described in Kernel Distribution Estimates. Note: Attempting to score outside the grid results in 0 for the density estimate.

By default, METHOD=INTERP.

You can include multiple SCORE statements. Each SCORE statement applies to the first UNIVAR or BIVAR statement that specifies the same variables as are included in the DATA= data set. The order of variables matters for the bivariate case. Therefore, a SCORE statement that specifies a DATA= data set consisting of variables x and y (in that order) will match only a BIVAR statement of one of the following forms:

  • BIVAR (x y)…</ options>;

  • BIVAR x y …</ options>;

Example

Suppose the data set MyData contains the variables x and y and the data sets MyScoreInX, MyScoreInY, and MyScoreInXY contain the variables x, y, and (x, y) respectively. The following statements request both the individual (univariate) and joint (bivariate) kernel density estimates and distributions and selectively score them:

proc kde data=MyData;
   univar x y;
   bivar x y / CDF;
   score data=MyScoreInX out=MarginalX;
   score data=MyScoreInY out=MarginalY;
   score data=MyScoreInXY out=JointXY;
run;

The first SCORE statement is associated with the UNIVAR statement and produces the MarginalX output data set. This data set contains variables x, value, and density, where density is the density function at value.

The second SCORE statement is associated with the UNIVAR statement and produces the MarginalY output data set. This data set contains variables y, value and density, where density is the density function at value.

The third SCORE statement is associated with the BIVAR statement and produces the JointXY output data set. This data set contains variables x, value1, y, value2, density, and distribution, where density and distribution are the density and distribution functions, respectively, at (value1, value2).

Last updated: December 09, 2022