The QUANTREG Procedure

Quantile Regression

Quantile regression generalizes the concept of a univariate quantile to a conditional quantile given one or more covariates. Recall that a student’s score on a test is at the tau quantile if his or her score is better than that of 100 tau percent-sign of the students who took the test. The score is also said to be at the 100tauth percentile.

For a random variable Y with probability distribution function

upper F left-parenthesis y right-parenthesis equals Prob left-parenthesis upper Y less-than-or-equal-to y right-parenthesis

the tau quantile of Y is defined as the inverse function

upper Q left-parenthesis tau right-parenthesis equals inf StartSet y colon upper F left-parenthesis y right-parenthesis greater-than-or-equal-to tau EndSet

where the quantile level tau ranges between 0 and 1. In particular, the median is upper Q left-parenthesis 1 slash 2 right-parenthesis.

For a random sample StartSet y 1 comma ellipsis comma y Subscript n Baseline EndSet of Y, it is well known that the sample median minimizes the sum of absolute deviations:

median equals arg min Subscript xi element-of bold upper R Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts StartAbsoluteValue y Subscript i Baseline minus xi EndAbsoluteValue

Likewise, the general tau sample quantile xi left-parenthesis tau right-parenthesis, which is the analog of upper Q left-parenthesis tau right-parenthesis, is formulated as the minimizer

xi left-parenthesis tau right-parenthesis equals arg min Subscript xi element-of bold upper R Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts rho Subscript tau Baseline left-parenthesis y Subscript i Baseline minus xi right-parenthesis

where rho Subscript tau Baseline left-parenthesis z right-parenthesis equals z left-parenthesis tau minus upper I left-parenthesis z less-than 0 right-parenthesis right-parenthesis, 0 less-than tau less-than 1, and where upper I left-parenthesis dot right-parenthesis denotes the indicator function. The loss function rho Subscript tau assigns a weight of tau to positive residuals y Subscript i Baseline minus xi and a weight of 1 minus tau to negative residuals.

Using this loss function, the linear conditional quantile function extends the tau sample quantile xi left-parenthesis tau right-parenthesis to the regression setting in the same way that the linear conditional mean function extends the sample mean. Recall that OLS regression estimates the linear conditional mean function upper E left-parenthesis upper Y vertical-bar upper X equals bold x right-parenthesis equals bold x prime bold-italic beta by solving for

ModifyingAbove bold-italic beta With caret equals arg min Subscript bold-italic beta element-of bold upper R Sub Superscript p Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline minus bold x prime Subscript i Baseline bold-italic beta right-parenthesis squared

The estimated parameter ModifyingAbove bold-italic beta With caret minimizes the sum of squared residuals in the same way that the sample mean ModifyingAbove mu With caret minimizes the sum of squares:

ModifyingAbove mu With caret equals arg min Subscript mu element-of bold upper R Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline minus mu right-parenthesis squared

Likewise, quantile regression estimates the linear conditional quantile function, upper Q Subscript tau Baseline left-parenthesis upper Y vertical-bar upper X equals bold x right-parenthesis equals upper Q Subscript upper Y vertical-bar bold x Baseline left-parenthesis tau right-parenthesis equals bold x prime bold-italic beta left-parenthesis tau right-parenthesis, by solving the following equation for tau element-of left-parenthesis 0 comma 1 right-parenthesis:

ModifyingAbove bold-italic beta With caret left-parenthesis tau right-parenthesis equals arg min Subscript bold-italic beta element-of bold upper R Sub Superscript p Baseline sigma-summation Underscript i equals 1 Overscript n Endscripts rho Subscript tau Baseline left-parenthesis y Subscript i Baseline minus bold x prime Subscript i Baseline bold-italic beta right-parenthesis

The quantity ModifyingAbove bold-italic beta With caret left-parenthesis tau right-parenthesis is called the tau regression quantile. The case tau equals 0.5 (which minimizes the sum of absolute residuals) corresponds to median regression (which is also known as upper L 1 regression).

The following set of regression quantiles is referred to as the quantile process:

StartSet bold-italic beta left-parenthesis tau right-parenthesis colon tau element-of left-parenthesis 0 comma 1 right-parenthesis EndSet

The QUANTREG procedure computes the quantile function upper Q Subscript upper Y vertical-bar bold x Baseline left-parenthesis tau right-parenthesis and conducts statistical inference on the estimated parameters ModifyingAbove bold-italic beta With caret left-parenthesis tau right-parenthesis.

Last updated: December 09, 2022