The QUANTREG Procedure

CONDDIST Statement

<label:> CONDDIST options;

The CONDDIST statement estimates and displays the conditional and marginal probability functions for the response random variable. These functions include the cumulative distribution functions (CDFs) and the probability density functions (PDFs). For more information about the probability functions, see the section Estimating Probability Functions by Using the CONDDIST Statement.

You can specify multiple CONDDIST statements. You can use the optional label, which must be a valid SAS name, to identify the output group for its corresponding CONDDIST statement.

Table 5 summarizes the options available in the CONDDIST statement.

Table 5: CONDDIST Statement Options

Option	Description
HIDERAW	Hides the observed marginal distribution of the training response for the analysis
MCDF	Performs the marginal distributions analysis
MWU	Performs the Mann-Whitney U test against the observed marginal CDF sample of the training response variable
OBS=	Specifies a subset of the training observations for the analysis by using the indices of these observations in the DATA= data set of the PROC QUANTREG statement
PDF=KDE	Specifies the kernel density estimator for estimating the probability density functions
PLOTS=	Specifies options for graphically displaying the probability functions
SHOWAVG	Requests the distribution of the training response that is conditional on the average explanatory covariates for the analysis
TESTDATA=	Specifies an input SAS data set that contains test observations for estimating the probability functions

You can specify the following options:

HIDERAW HR

hides the observed marginal CDF sample of the training response for the analysis. This CDF sample is estimated from the sample of all the response values in the training data without using the quantile regression model. The CONDDIST statement assigns the type “Observed” and the label “TrainObs” to this distribution.

MCDF(mcdf-option)

performs the marginal distributions analysis by using the weighted bootstrap resampling method. For more information, see the section Marginal Distribution Analysis Using the Bootstrap Resampling Method. You can specify the following mcdf-options:

QUANTILERANGE=BOTH | TRAIN | NONE QRNG=BOTH | TRAIN | NONE

specifies the type of limits for the quantile predictions. You can specify the following limits:

BOTH: specifies the range of the observed response values for combined training and testing data sets.
TRAIN: specifies the range of the observed response values for only the training data set.
NONE: specifies no forced range.

If a quantile prediction is smaller than the lower limit of the specified response range, the quantile prediction is reset to the lower-limit value for postprocessing. If a quantile prediction is larger than the upper limit of the specified response range, the quantile prediction is reset to the upper-limit value for postprocessing. By default, QRNG=BOTH. For more information, see Step 5b in the section Marginal Distribution Analysis Using the Bootstrap Resampling Method.

NREP=n

specifies the number of repetitions for the bootstrap resampling process.

The MCDF option repeatedly fits a quantile process regression model for each reweighted bootstrap sample of the training observations in the DATA= data set in the PROC QUANTREG statement. If you also specify the TESTDATA= testing data set option in the CONDDIST statement, then the MCDF option additionally generates a reweighted bootstrap sample for the testing observations in each repetition; estimates counterfactual distribution for the testing sample by using the relevant training-sample quantile process model; and outputs the marginal distribution comparisons table to compare the averaged-across-all-samples training, testing, and counterfactual testing marginal distributions. You can output all the fitted training-sample quantile process regression models by using the following ODS OUTPUT statement:

ods output BootProcessEst=BPE;

MWU(mwu-option) WILCOXON(mwu-option) RANKSUM(mwu-option)

performs the Mann-Whitney U test (also called the Wilcoxon rank-sum test and denoted as the MWU test) for each CDF sample of all specified observations against the observed marginal CDF sample of the training response variable for the DATA= data set in the PROC QUANTREG statement.

You can specify the following mwu-option:

SAMPLESIZE=NOBS | NQ SAMPSIZE=NOBS | NQ

specifies the sizes for the relevant CDF samples.

You can specify one of the following sample-size values:

NOBS: requests that the size of the CDF samples, except the observed marginal CDF sample for the TESTDATA= data set in the CONDDIST statement (labeled as “TestObs”), be equal to the number of training observations in the DATA= data set in the PROC QUANTREG statement. The SAMPSIZE=NOBS option requests that the size of the TestObs CDF sample be equal to the number of testing observations in the TESTDATA= data set in the CONDDIST statement. The SAMPSIZE=NOBS option is appropriate for the MWU tests when the size of the quantile-level grid is larger than the number of training observations in the DATA= data set in the PROC QUANTREG statement.
NQ: requests that the size of the CDF samples be equal to the size of the quantile-level grid, so that you can indirectly specify the size of the CDF samples by using the QUANTLEV=FQPR(N=) option in the MODEL statement. The SAMPSIZE=NQ option is appropriate for the MWU tests when the size of the quantile-level grid is smaller than both the number of training observations in the DATA= data set in the PROC QUANTREG statement and the number of testing observations in the TESTDATA= data set in the CONDDIST statement.

By default, SAMPSIZE=NQ. The TESTDATA(MWU(SAMPSIZE=)) option overrides this MWU(SAMPSIZE=) option. For more information about the MWU tests and the CDF samples, see the section Mann-Whitney U Test.

OBS=number-list

specifies the observation indices to use for the conditional distribution analysis. Each observation index identifies a training observation in the DATA= data set that you specify in the PROC QUANTREG statement. The CONDDIST statement types the distributions of these observations as “Fit for Obs” and labels the distributions by using the observation ID values (if available) or the observation indices.

PDF=KDE(kde-options)

specifies the kernel density estimator for estimating the probability density functions. For more information about the kernel density estimates, see the section Probability Density Functions.

You can specify the following kde-options:

C=value

specifies the standardized bandwidth parameter c for the kernel density estimator.

You can specify one of the following values:

number: specifies a positive number.
MISE: minimizes the approximate mean integrated square error (MISE).
SJPI: computes the bandwidth parameter by using the Sheather-Jones plug-in method.

By default, C=MISE.

K=NORMAL | TRIANGULAR | QUADRATIC

specifies the type of kernel function to use for the kernel density estimator. NORMAL species the normal kernel. TRIANGULAR species the triangular kernel. QUADRATIC species the quadratic kernel. By default, K=NORMAL.

LOWER=value L=value

specifies the lower bound value for the kernel density curves that suppresses the lower tails of the PDF estimates. If value is smaller than the minimum quantile-grid value of all the PDF estimates, the CONDDIST statement ignores this suboption and outputs the minimum quantile-grid value as the lower bound value in the density estimation table.

LUADJUST=TRIM | SCALE | REFLECT LUA=TRIM | SCALE | REFLECT

specifies the adjustment type for kernel density estimation when you specify the LOWER=value or UPPER=value option (or both).

Let denote the lower tail of the PDF estimates, denote the remaining PDF estimates, and denote the upper tail of the PDF estimates. For simplicity of notation, assume that for all and . You can specify one of the following types:

TRIM

trims the tails of the PDF estimates without adjusting the remaining PDF estimates, so that for all and .

SCALE

suppresses the tails of the PDF estimates and scales the remaining PDF estimates by , where is the sum of the PDF estimates and is the sum of the remaining PDF estimates.

REFLECT

suppresses the tails of the PDF estimates and adjusts the remaining PDF estimates by using the reflections of the suppressed tails. The adjusted equals

ModifyingAbove f With caret Subscript l plus j Baseline plus sigma-summation Underscript k equals 1 Overscript upper K Endscripts left-parenthesis ModifyingAbove f With caret Subscript l plus left-parenthesis 1 minus k right-parenthesis m plus 1 minus j Baseline plus ModifyingAbove f With caret Subscript l plus left-parenthesis 1 plus k right-parenthesis m plus 1 minus j Baseline right-parenthesis for j equals 1 comma ellipsis comma m

where K is the smallest integer that satisfies and .

By default, LUA=TRIM.

UPPER=value U=value

specifies the upper bound value for the kernel density curves that suppresses the upper tails of the PDF estimates. If value is larger than the maximum quantile-grid value of all the PDF estimates, the CONDDIST statement ignores this suboption and outputs the maximum quantile-grid value as the upper bound value in the density estimation table.

When you specify the PLOT= option to create the CDF plot and the PDF plot, the LOWER= and UPPER= suboptions of the PDF=KDE option also set limits for the horizontal range of the plots.

PLOT | PLOTS<global-plot-options><=plot-request> PLOT | PLOTS<global-plot-options><=(plot-request < …plot-request > )>

specifies graphical options for displaying the probability functions.

You can specify the following global-plot-options, which apply to all plots that the CONDDIST statement generates:

HIDEDROPLINES HDL: suppresses the drop lines for the responses.
HIDEDROPNUMBERS HDN: suppresses the drop numbers for the responses.
HIDEOBSDOTS HOD: suppresses the response dots.
HIDEOBSLABELS HOL: suppresses the response labels.
SHOWGRIDS SG: displays the grid lines.

You can specify the following plot-requests:

ALL

creates all appropriate plots.

CDFPLOT<(plot-options)>

plots the cumulative distribution functions (CDFs).

You can use any of the global-plot-options for the PLOTS option as the plot-options for the CDFPLOT option.

MCDFPLOT<(plot-options)>

plots the bootstrap-averaged marginal CDF (MCDF) samples and their confidence bands. You must also specify the MCDF option in the CONDDIST statement to request the MCDF plot.

You can specify the following plot-options:

HIDEFIT HF: suppresses the counterfactual marginal CDF sample of the fitted test response and its confidence limits.
HIDERAWTEST HRTST: suppresses the observed marginal CDF sample of the test response and its confidence limits.
HIDERAWTRAIN HRTRN: suppresses the observed marginal CDF sample of the training response and its confidence limits.
NOLIMITS HCL: suppresses all the confidence limits.

You can use the SHOWGRIDS global-plot-option as the plot-option to display the grid lines.

PDFPLOT<(plot-options)>

plots the probability density functions (PDFs).

You can use any of the global-plot-options for the PLOTS option as the plot-options for the PDFPLOT option.

PPPLOT<(plot-options)>

creates the scatter plot of the regression quantile levels versus the sample quantile levels for the relevant response values. This plot is referred to as the probability-probability plot or PP plot in short.

You can use any of the global-plot-options of the PLOTS option, except the HOD suboption, as the plot-options for the PPPLOT option.

When you specify the PLOTS option, the following options in the CONDDIST statement control the visualization of their relevant probability functions:

the HIDERAW and SHOWAVG options
the HIDEFIT, HIDERAW, SHOWAVG, and SHOWOBS suboptions of the TESTDATA option

SHOWAVG SA

requests the conditional distribution of the training response at average, , for the analysis, where is the average explanatory covariates vector for all the training observations. The CONDDIST statement assigns the type "Fit at Average" and the label "TrainAvg" to this distribution.

TESTDATA(options)=SAS-data-set

specifies the test data set for the conditional distribution analysis. The TESTDATA= data set must contain all the explanatory variables that you specify in the MODEL statement.

You can specify the following options:

HIDEFIT HF

hides the counterfactual marginal CDF sample of the fitted test response for the analysis. This marginal CDF sample for integrates out the quantile regression model by pooling together all the quantile process predictions of all the test observations in the TESTDATA= data set. The CONDDIST statement assigns the type “Fit and Pooled” and the label “TestFit” to this distribution.

HIDERAW HR

hides the observed marginal CDF sample of the test response for the analysis. This CDF sample is estimated from the sample of all the response values in the TESTDATA= data set without using the quantile regression model. The CONDDIST statement assigns the type “Observed” and the label “TestObs” to this distribution.

MWU(mwu-option) WILCOXON(mwu-option) RANKSUM(mwu-option)

performs the Mann-Whitney U test for each CDF sample of all specified observations against the observed marginal CDF sample of the testing response variable for the TESTDATA= data set in the CONDDIST statement.

You can specify the following mwu-option:

SAMPLESIZE=NOBS | NQ SAMPSIZE=NOBS | NQ

specifies the sizes of the relevant CDF samples.

You can specify one of the following sample-size values:

NOBS: requests that the size of the CDF samples, except the observed marginal CDF sample for the TESTDATA= data set in the CONDDIST statement (labeled as “TestObs”), be equal to the number of training observations in the DATA= data set in the PROC QUANTREG statement. The SAMPSIZE=NOBS option requests that the size of the TestObs CDF sample be equal to the number of testing observations in the TESTDATA= data set in the CONDDIST statement. The SAMPSIZE=NOBS option is appropriate for the MWU tests when the size of the quantile-level grid is larger than the number of training observations in the DATA= data set in the PROC QUANTREG statement.
NQ: requests that the size of the CDF samples be equal to the size of the quantile-level grid, so that you can indirectly specify the size of the CDF samples by using the QUANTLEV=FQPR(N=) option in the MODEL statement. The SAMPSIZE=NQ option is appropriate for the MWU tests when the size of the quantile-level grid is smaller than both the number of training observations in the DATA= data set in the PROC QUANTREG statement and the number of testing observations in the TESTDATA= data set in the CONDDIST statement.

By default, SAMPSIZE=NQ. This TESTDATA(MWU(SAMPSIZE=)) option overrides the MWU(SAMPSIZE=) option in the CONDDIST statement. For more information about the MWU tests and the CDF samples, see the section Mann-Whitney U Test.

SHOWAVG SA

requests the conditional CDF sample of the test response at average, , for the analysis, where is the average explanatory covariates vector for the TESTDATA= data set. The CONDDIST statement assigns the type “Fit at Average” and the label “TestAvg” to this CDF sample.

SHOWOBS SO

requests the conditional CDF samples of all the test observations in the analysis. The CONDDIST statement assigns the type “Fit for Obs” to the CDF samples of these observations and labels these CDF samples by using the observation ID values (if available) or the observation indices. By default, the TESTDATA= option ignores these CDF samples.

By default, if the TESTDATA= data set contains fewer than 16 observations, the CONDDIST statement ignores the observed marginal CDF sample of . Otherwise, the CONDDIST statement estimates the observed marginal CDF sample of .

Last updated: December 09, 2022