The QUANTREG Procedure

Confidence Interval

The QUANTREG procedure provides three methods to compute confidence intervals for the regression quantile parameter bold-italic beta left-parenthesis tau right-parenthesis: sparsity, rank, and resampling. The sparsity method is the most direct and the fastest, but it involves estimation of the sparsity function, which is not robust for data that are not independently and identically distributed. To deal with this problem, the QUANTREG procedure uses a local estimate of the sparsity function to compute a Huber sandwich estimate. The rank method, which computes confidence intervals by inverting the rank score test, does not suffer from this problem. However, the rank method uses the simplex algorithm and is computationally expensive with large data sets. The resampling method, which uses the bootstrap approach, addresses these problems, but at a computation cost.

Based on these properties, the QUANTREG uses a combination of the resampling and rank methods as the default. For data sets that have more than either 5,000 observations or more than 20 variables, the QUANTREG procedure uses the MCMB resampling method; otherwise it uses the rank method. You can request a particular method by using the CI= option in the PROC QUANTREG statement.

Sparsity

Consider the linear model

y Subscript i Baseline equals bold x prime Subscript i Baseline bold-italic beta plus epsilon Subscript i

Assume that StartSet epsilon Subscript i Baseline EndSet, i equals 1 comma ellipsis comma n, are iid with a distribution F and a density f equals upper F prime, where f left-parenthesis upper F Superscript negative 1 Baseline left-parenthesis tau right-parenthesis right-parenthesis greater-than 0 in a neighborhood of tau. Under some mild conditions,

StartRoot n EndRoot left-parenthesis ModifyingAbove bold-italic beta With caret left-parenthesis tau right-parenthesis minus bold-italic beta left-parenthesis tau right-parenthesis right-parenthesis right-arrow upper N left-parenthesis 0 comma omega squared left-parenthesis tau comma upper F right-parenthesis bold upper Omega Superscript negative 1 Baseline right-parenthesis

where omega squared left-parenthesis tau comma upper F right-parenthesis equals tau left-parenthesis 1 minus tau right-parenthesis slash f squared left-parenthesis upper F Superscript negative 1 Baseline left-parenthesis tau right-parenthesis right-parenthesis and bold upper Omega equals limit Underscript n right-arrow normal infinity Endscripts n Superscript negative 1 Baseline sigma-summation bold x Subscript i Baseline bold x prime Subscript i (Koenker and Bassett 1982b).

This asymptotic distribution for the regression quantile ModifyingAbove bold-italic beta With caret left-parenthesis tau right-parenthesis can be used to construct confidence intervals. However, the reciprocal of the density function,

s left-parenthesis tau right-parenthesis equals left-bracket f left-parenthesis upper F Superscript negative 1 Baseline left-parenthesis tau right-parenthesis right-parenthesis right-bracket Superscript negative 1

which is called the sparsity function, must first be estimated.

Because

s left-parenthesis t right-parenthesis equals StartFraction d Over d t EndFraction upper F Superscript negative 1 Baseline left-parenthesis t right-parenthesis

s left-parenthesis t right-parenthesis can be estimated by the difference quotient of the empirical quantile function—that is,

ModifyingAbove s With caret Subscript n Baseline left-parenthesis t right-parenthesis equals left-bracket ModifyingAbove upper F With caret Subscript n Superscript negative 1 Baseline left-parenthesis t plus h Subscript n Baseline right-parenthesis minus ModifyingAbove upper F With caret Subscript n Superscript negative 1 Baseline left-parenthesis t minus h Subscript n Baseline right-parenthesis right-bracket slash 2 h Subscript n

where ModifyingAbove upper F With caret Subscript n is an estimate of upper F Superscript negative 1 and h Subscript n is a bandwidth that tends to 0 as n right-arrow normal infinity.

The QUANTREG procedure provides two bandwidth methods. The Bofinger bandwidth

h Subscript n Baseline equals n Superscript negative 1 slash 5 Baseline left-parenthesis StartFraction 4.5 s squared left-parenthesis t right-parenthesis Over left-parenthesis s Superscript left-parenthesis 2 right-parenthesis Baseline left-parenthesis t right-parenthesis right-parenthesis squared EndFraction right-parenthesis Superscript 1 slash 5

is an optimizer of mean squared error for standard density estimation. The Hall-Sheather bandwidth

h Subscript n Baseline equals n Superscript negative 1 slash 3 Baseline z Subscript alpha Superscript 2 slash 3 Baseline left-parenthesis StartFraction 1.5 s left-parenthesis t right-parenthesis Over s Superscript left-parenthesis 2 right-parenthesis Baseline left-parenthesis t right-parenthesis EndFraction right-parenthesis Superscript 1 slash 3

is based on Edgeworth expansions for studentized quantiles, where s Superscript left-parenthesis 2 right-parenthesis Baseline left-parenthesis t right-parenthesis is the second derivative of s left-parenthesis t right-parenthesis and z Subscript alpha satisfies normal upper Phi left-parenthesis z Subscript alpha Baseline right-parenthesis equals 1 minus alpha slash 2 for the construction of 1 minus alpha confidence intervals. The following quantity is not sensitive to f and can be estimated by assuming f is Gaussian:

StartFraction s left-parenthesis t right-parenthesis Over s Superscript left-parenthesis 2 right-parenthesis Baseline left-parenthesis t right-parenthesis EndFraction equals StartFraction f squared Over 2 left-parenthesis f Superscript left-parenthesis 1 right-parenthesis Baseline slash f right-parenthesis squared plus left-bracket left-parenthesis f Superscript left-parenthesis 1 right-parenthesis Baseline slash f right-parenthesis squared minus f Superscript left-parenthesis 2 right-parenthesis Baseline slash f right-bracket EndFraction

upper F Superscript negative 1 can be estimated in either of the following ways:

  • by the empirical quantile function of the residuals from the quantile regression fit,

    ModifyingAbove upper F With caret Superscript negative 1 Baseline left-parenthesis t right-parenthesis equals r Subscript left-parenthesis i right-parenthesis Baseline comma for t element-of left-bracket left-parenthesis i minus 1 right-parenthesis slash n comma i slash n right-parenthesis comma
  • by the empirical quantile function of regression proposed by Bassett and Koenker (1982),

    ModifyingAbove upper F With caret Superscript negative 1 Baseline left-parenthesis t right-parenthesis equals bold x overbar prime ModifyingAbove bold-italic beta With caret left-parenthesis t right-parenthesis

The QUANTREG procedure interpolates the first empirical quantile function and produces the piecewise linear version:

ModifyingAbove upper F With caret Superscript negative 1 Baseline left-parenthesis t right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column r Subscript left-parenthesis 1 right-parenthesis Baseline 2nd Column if t element-of left-bracket 0 comma 0.5 slash n right-parenthesis 2nd Row 1st Column lamda r Subscript left-parenthesis i plus 1 right-parenthesis Baseline plus left-parenthesis 1 minus lamda right-parenthesis r Subscript left-parenthesis i right-parenthesis Baseline 2nd Column if t element-of left-bracket left-parenthesis i minus 0.5 right-parenthesis slash n comma left-parenthesis i plus 0.5 right-parenthesis slash n right-parenthesis 3rd Row 1st Column r Subscript left-parenthesis n right-parenthesis Baseline 2nd Column if t element-of left-bracket left-parenthesis n minus 0.5 right-parenthesis slash n comma 1 right-bracket EndLayout

ModifyingAbove upper F With caret Superscript negative 1 is set to a constant if t plus-or-minus h Subscript n falls outside left-bracket 0 comma 1 right-bracket.

This estimator of the sparsity function is sensitive to the iid assumption. Alternately, Koenker and Machado (1999) consider the non-iid case. By assuming local linearity of the conditional quantile function upper Q Subscript upper Y vertical-bar bold x Baseline left-parenthesis tau right-parenthesis in bold x, they propose a local estimator of the density function by using the difference quotient. A Huber sandwich estimate of the covariance and standard error is computed and used to construct the confidence intervals. One difficulty with this method is the selection of the bandwidth when using the difference quotient. With a small sample size, either the Bofinger or the Hall-Sheather bandwidth tends to be too large to assure local linearity of the conditional quantile function. The QUANTREG procedure uses a heuristic bandwidth selection in these cases.

By default, the QUANTREG procedure computes non-iid confidence intervals. You can request iid confidence intervals by specifying the IID option in the PROC QUANTREG statement.

Inversion of Rank Tests

The classical theory of rank tests can be extended to test the hypothesis upper H 0: beta 2 equals eta in the linear regression model bold y equals bold upper X 1 beta 1 plus bold upper X 2 beta 2 plus bold-italic epsilon. Here, left-parenthesis bold upper X 1 comma bold upper X 2 right-parenthesis equals bold upper A prime, where bold-italic epsilon equals left-parenthesis epsilon 1 comma ellipsis comma epsilon Subscript n Baseline right-parenthesis prime is the unknown error vector.

See Gutenbrunner and Jureckova (1992) for more details. By inverting this test, confidence intervals can be computed for the regression quantiles that correspond to beta 2.

The rank score function ModifyingAbove a With caret Subscript n Baseline left-parenthesis t right-parenthesis equals left-parenthesis ModifyingAbove a With caret Subscript n Baseline 1 Baseline left-parenthesis t right-parenthesis comma ellipsis comma ModifyingAbove a With caret Subscript n n Baseline left-parenthesis t right-parenthesis right-parenthesis can be obtained by solving the dual problem:

max Underscript a Endscripts left-brace left-parenthesis bold y minus bold upper X 2 bold-italic eta right-parenthesis prime a vertical-bar bold upper X prime 1 a equals left-parenthesis 1 minus t right-parenthesis bold upper X prime 1 bold e comma a element-of left-bracket 0 comma 1 right-bracket Superscript n Baseline right-brace

For a fixed quantile tau, integrating ModifyingAbove a With caret Subscript n i Baseline left-parenthesis t right-parenthesis with respect to the tau-quantile score function

phi Subscript tau Baseline left-parenthesis t right-parenthesis equals tau minus upper I left-parenthesis t less-than tau right-parenthesis

yields the tau-quantile scores

ModifyingAbove b With caret Subscript n i Baseline equals minus integral Subscript 0 Superscript 1 Baseline phi Subscript tau Baseline left-parenthesis t right-parenthesis d ModifyingAbove a With caret Subscript n i Baseline left-parenthesis t right-parenthesis equals ModifyingAbove a With caret Subscript n i Baseline left-parenthesis tau right-parenthesis minus left-parenthesis 1 minus tau right-parenthesis

Under the null hypothesis upper H 0: beta 2 equals eta,

upper S Subscript n Baseline left-parenthesis eta right-parenthesis equals n Superscript negative 1 slash 2 Baseline bold upper X prime 2 ModifyingAbove b With caret Subscript n Baseline left-parenthesis eta right-parenthesis right-arrow upper N left-parenthesis 0 comma tau left-parenthesis 1 minus tau right-parenthesis bold upper Omega Subscript n Baseline right-parenthesis

for large n, where bold upper Omega Subscript n Baseline equals n Superscript negative 1 Baseline bold upper X prime 2 left-parenthesis bold upper I minus bold upper X 1 left-parenthesis bold upper X prime 1 bold upper X 1 right-parenthesis Superscript negative 1 Baseline bold upper X prime 1 right-parenthesis bold upper X 2.

Let

upper T Subscript n Baseline left-parenthesis eta right-parenthesis equals StartFraction 1 Over StartRoot tau left-parenthesis 1 minus tau right-parenthesis EndRoot EndFraction upper S Subscript n Baseline left-parenthesis eta right-parenthesis bold upper Omega Subscript n Superscript negative 1 slash 2

Then upper T Subscript n Baseline left-parenthesis ModifyingAbove beta With caret Subscript 2 Baseline left-parenthesis tau right-parenthesis right-parenthesis equals 0 from the constraint bold upper A ModifyingAbove a With caret equals left-parenthesis 1 minus tau right-parenthesis bold upper A bold e in the full model. In order to obtain confidence intervals for beta 2, a critical value can be specified for upper T Subscript n. The dual vector ModifyingAbove a With caret Subscript n Baseline left-parenthesis eta right-parenthesis is a piecewise constant in eta, and eta can be altered without compromising the optimality of ModifyingAbove a With caret Subscript n Baseline left-parenthesis eta right-parenthesis as long as the signs of the residuals in the primal quantile regression problem do not change. When eta gets to such a boundary, the solution does change. But it can be restored by taking one simplex pivot. The process can continue in this way until upper T Subscript n Baseline left-parenthesis eta right-parenthesis exceeds the specified critical value. Because upper T Subscript n Baseline left-parenthesis eta right-parenthesis is piecewise constant, interpolation can be used to obtain the desired level of confidence interval (Koenker and d’Orey 1994).

Resampling

The bootstrap can be implemented to compute confidence intervals for regression quantile estimates. As in other regression applications, both the residual bootstrap and the xy-pair bootstrap can be used. The former assumes iid random errors and resamples from the residuals, whereas the latter resamples xy pairs and accommodates some forms of heteroscedasticity. Koenker (1994) considered a more interesting resampling mechanism, resampling directly from the full regression quantile process, which he called the Heqf bootstrap.

In contrast with these bootstrap methods, Parzen, Wei, and Ying (1994) observed that the following estimating equation for the tau regression quantile is a pivotal quantity for the tau quantile regression parameter beta Subscript tau:

upper S left-parenthesis bold-italic beta right-parenthesis equals n Superscript negative 1 slash 2 Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts bold x Subscript i Baseline left-parenthesis tau minus upper I left-parenthesis y Subscript i Baseline less-than-or-equal-to bold x prime Subscript i Baseline bold-italic beta right-parenthesis right-parenthesis

In other words, the distribution of bold upper S left-parenthesis bold-italic beta right-parenthesis can be generated exactly by a random vector bold upper U, which is a weighted sum of independent, re-centered Bernoulli variables. They further showed that for large n, the distribution of ModifyingAbove bold-italic beta With caret left-parenthesis tau right-parenthesis minus beta Subscript tau can be approximated by the conditional distribution of ModifyingAbove bold-italic beta With caret Subscript upper U Baseline minus ModifyingAbove bold-italic beta With caret Subscript n Baseline left-parenthesis tau right-parenthesis, where ModifyingAbove bold-italic beta With caret Subscript upper U solves an augmented quantile regression problem by using n + 1 observations that have bold x Subscript n plus 1 Baseline equals minus n Superscript negative 1 slash 2 Baseline u slash tau and y Subscript n plus 1 sufficiently large for a given realization of u. By exploiting the asymptotically pivotal role of the quantile regression "gradient condition," this approach also achieves some robustness to certain heteroscedasticity.

Although the bootstrap method by Parzen, Wei, and Ying (1994) is much simpler, it is too time-consuming for relatively large data sets, especially for high-dimensional data sets. The QUANTREG procedure implements a new, general resampling method developed by He and Hu (2002), which is called the Markov chain marginal bootstrap (MCMB). For quantile regression, the MCMB method has the advantage that it solves p one-dimensional equations instead of solving p-dimensional equations, as the previous bootstrap methods do. This greatly improves the feasibility of the resampling method in computing confidence intervals for regression quantiles.

Last updated: December 09, 2022