The QUANTREG Procedure

Optimization Algorithms

The optimization problem for median regression has been formulated and solved as a linear programming (LP) problem since the 1950s. Variations of the simplex algorithm, especially the method of Barrodale and Roberts (1973), have been widely used to solve this problem. The simplex algorithm is computationally demanding in large statistical applications, and in theory the number of iterations can increase exponentially with the sample size. This algorithm is often useful with data that contain no more than tens of thousands of observations.

Several alternatives have been developed to handle regression for larger data sets. The interior point approach of Karmarkar (1984) solves a sequence of quadratic problems in which the relevant interior of the constraint set is approximated by an ellipsoid. The worst-case performance of the interior point algorithm has been proved to be better than the worst-case performance of the simplex algorithm. More important, experience has shown that the interior point algorithm is advantageous for larger problems.

Like regression, general quantile regression fits nicely into the standard primal-dual formulations of linear programming.

In addition to the interior point method, various heuristic approaches are available for computing -type solutions. Among these, the finite smoothing algorithm of Madsen and Nielsen (1993) is the most useful. It approximates the -type objective function with a smoothing function, so that the Newton-Raphson algorithm can be used iteratively to obtain a solution after a finite number of iterations. The smoothing algorithm extends naturally to general quantile regression.

The QUANTREG procedure implements the simplex, interior point, and smoothing algorithms. The remainder of this section describes these algorithms in more detail.

Simplex Algorithm

Let , , , and , where is the response vector, is the regressor matrix, and is the nonnegative part of z.

Let . For the problem, the simplex approach solves by reformulating it as the constrained minimization problem

min Underscript bold-italic beta Endscripts left-brace bold e prime bold-italic mu plus bold e prime bold-italic nu vertical-bar bold y equals bold upper A prime bold-italic beta plus bold-italic mu minus bold-italic nu comma StartSet bold-italic mu comma bold-italic nu EndSet element-of bold upper R Subscript plus Superscript n Baseline right-brace

where denotes an vector of ones.

Let , , and , where . The reformulation presents a standard LP problem:

This problem has the following dual formulation:

This formulation can be simplified as

max Underscript z Endscripts bold y prime bold z semicolon subject to bold upper A bold z equals bold 0 comma bold z element-of left-bracket negative 1 comma 1 right-bracket Superscript n

By setting , the problem becomes

max Underscript bold-italic eta Endscripts bold y prime bold-italic eta semicolon subject to bold upper A bold-italic eta equals bold b comma bold-italic eta element-of left-bracket 0 comma 1 right-bracket Superscript n

For quantile regression, the minimization problem is , and a similar set of steps leads to the dual formulation

max Underscript z Endscripts bold y prime bold z semicolon subject to bold upper A bold z equals left-parenthesis 1 minus tau right-parenthesis bold upper A bold e comma bold z element-of left-bracket 0 comma 1 right-bracket Superscript n

The QUANTREG procedure solves this LP problem by using the simplex algorithm of Barrodale and Roberts (1973). This algorithm exploits the special structure of the coefficient matrix by solving the primary LP problem (P) in two stages: The first stage chooses the columns in or as pivotal columns. The second stage interchanges the columns in or – as basis or nonbasis columns, respectively. The algorithm obtains an optimal solution by executing these two stages interactively. Moreover, because of the special structure of , only the main data matrix is stored in the current memory.

Although this special version of the simplex algorithm was introduced for median regression, it extends naturally to quantile regression for any given quantile and even to the entire quantile process (Koenker and d’Orey 1994). It greatly reduces the computing time that is required by the general simplex algorithm, and it is suitable for data sets with fewer than 5,000 observations and 50 variables.

Interior Point Algorithm

The ALGORITHM=INTERIOR option implements an interior point algorithm. This algorithm uses the primal-dual predictor-corrector method that is proposed by Lustig, Marsten, and Shanno (1992). Roos, Terlaky, and Vial (1997) provide more information about this particular algorithm. The following brief introduction of this algorithm uses the notation in the first reference.

To be consistent with the conventional linear programming setting, let , let , and let u be the general upper bound. The dual form of quantile regression solves the following linear programming primal problem:

: ; subject to ,

This primal problem has n variables. The index i denotes a variable number, and k denotes an iteration number. If k is used as a subscript or superscript, it denotes "of iteration k."

Let be the primal slack so that . Associate dual variables w with these constraints. The interior point algorithm solves the system of equations to satisfy the Karush-Kuhn-Tucker (KKT) conditions for optimality:

where , , , .

These are the conditions for feasibility, with the addition of complementarity conditions and . The equality must occur at the optimum. Complementarity forces the optimal objectives of the primal and dual to be equal, , because

Therefore

0 equals bold c prime bold z Subscript o p t Baseline minus bold b prime bold t Subscript o p t Baseline plus bold u prime bold w Subscript o p t

The duality gap, , measures the convergence of the algorithm. You can specify a tolerance for this convergence criterion in the TOLERANCE= option in the PROC QUANTREG statement.

Before the optimum is reached, it is possible for a solution to violate the KKT conditions in one of several ways:

Primal bound constraints can be broken: .
Primal constraints can be broken: .
Dual constraints can be broken: .
Complementarity conditions are unsatisfied: and .

The interior point algorithm works by using Newton’s method to find a direction to move from the current solution toward a better solution:

left-parenthesis bold z Superscript k plus 1 Baseline comma bold t Superscript k plus 1 Baseline comma bold s Superscript k plus 1 Baseline comma bold v Superscript k plus 1 Baseline comma bold w Superscript k plus 1 Baseline right-parenthesis equals left-parenthesis bold z Superscript k Baseline comma bold t Superscript k Baseline comma bold s Superscript k Baseline comma bold v Superscript k Baseline comma bold w Superscript k Baseline right-parenthesis plus kappa left-parenthesis bold upper Delta z Superscript k Baseline comma bold upper Delta t Superscript k Baseline comma bold upper Delta s Superscript k Baseline comma bold upper Delta v Superscript k Baseline comma bold upper Delta w Superscript k Baseline right-parenthesis

is the step length and is assigned a value as large as possible, but not so large that a or is "too close" to 0. You can control the step length in the KAPPA= option in the PROC QUANTREG statement.

The QUANTREG procedure implements a predictor-corrector variant of the primal-dual interior point algorithm. First, Newton’s method is used to find a direction in which to move. This is known as the affine step.

In iteration k, the affine step system that must be solved is

StartLayout 1st Row 1st Column bold-italic delta Subscript b Baseline equals 2nd Column bold upper Delta z Subscript a f f plus bold upper Delta v Subscript a f f 2nd Row 1st Column bold-italic delta Subscript c Baseline equals 2nd Column bold upper A bold upper Delta z Subscript a f f 3rd Row 1st Column bold-italic delta Subscript d Baseline equals 2nd Column bold upper A prime bold upper Delta t Subscript a f f Baseline plus bold upper Delta s Subscript a f f Baseline minus bold upper Delta w Subscript a f f Baseline equals bold-italic delta Subscript d Baseline 4th Row 1st Column minus bold upper Z bold upper S bold e equals 2nd Column bold upper S bold upper Delta z Subscript a f f plus bold upper Z bold upper Delta s Subscript a f f 5th Row 1st Column minus bold upper V bold upper W bold e equals 2nd Column bold upper V bold upper Delta w Subscript a f f plus bold upper W bold upper Delta z Subscript a f f EndLayout

Therefore, the following computations are involved in solving the affine step, where is the step length as before:

StartLayout 1st Row 1st Column bold upper Theta 2nd Column equals bold upper S bold upper Z Superscript negative 1 Baseline plus bold upper W bold upper V Superscript negative 1 Baseline 2nd Row 1st Column bold-italic rho 2nd Column equals bold upper Theta Superscript negative 1 Baseline left-parenthesis bold-italic delta Subscript d Baseline plus left-parenthesis bold upper S minus bold upper W right-parenthesis bold e minus bold upper V Superscript negative 1 Baseline bold upper W bold-italic delta Subscript b Baseline right-parenthesis 3rd Row 1st Column bold upper Delta t Subscript a f f 2nd Column equals left-parenthesis bold upper A bold upper Theta Superscript negative 1 Baseline bold upper A prime right-parenthesis Superscript negative 1 Baseline left-parenthesis bold-italic delta Subscript c Baseline plus bold upper A bold-italic rho right-parenthesis 4th Row 1st Column bold upper Delta z Subscript a f f 2nd Column equals bold upper Theta Superscript negative 1 Baseline bold upper A prime bold upper Delta t Subscript a f f Baseline minus bold-italic rho 5th Row 1st Column bold upper Delta v Subscript a f f 2nd Column equals bold-italic delta Subscript b Baseline minus bold upper Delta z Subscript a f f Baseline 6th Row 1st Column bold upper Delta w Subscript a f f 2nd Column equals minus bold upper W bold e minus bold upper V Superscript negative 1 Baseline bold upper W bold upper Delta z Subscript a f f Baseline 7th Row 1st Column bold upper Delta s Subscript a f f 2nd Column equals minus bold upper S bold e minus bold upper Z Superscript negative 1 Baseline bold upper S bold upper Delta z Subscript a f f Baseline 8th Row 1st Column left-parenthesis bold z Subscript a f f Baseline comma bold t Subscript a f f Baseline comma bold s Subscript a f f Baseline comma bold v Subscript a f f Baseline comma bold w Subscript a f f Baseline right-parenthesis 2nd Column equals left-parenthesis bold z comma bold t comma bold s comma bold v comma bold w right-parenthesis plus kappa left-parenthesis bold upper Delta z Subscript a f f Baseline comma bold upper Delta t Subscript a f f Baseline comma bold upper Delta s Subscript a f f Baseline comma bold upper Delta v Subscript a f f Baseline comma bold upper Delta w Subscript a f f Baseline right-parenthesis EndLayout

The success of the affine step is gauged by calculating the complementarity of and at and comparing it with the complementarity at the starting point . If the affine step was successful in reducing the complementarity by a substantial amount, the need for centering is not great. Therefore, a value close to 0 is assigned to in the following second linear system, which is used to determine a centering vector.

The following linear system is solved to determine a centering vector from :

However, if the affine step was unsuccessful, then centering is deemed beneficial, and a value close to 1.0 is assigned to . In other words, the value of is adaptively altered depending on the progress made toward the optimum.

Therefore, the following computations are involved in solving the centering step:

StartLayout 1st Row 1st Column bold-italic rho 2nd Column equals bold upper Theta Superscript negative 1 Baseline left-parenthesis sigma mu left-parenthesis bold upper Z Superscript negative 1 Baseline minus bold upper V Superscript negative 1 Baseline right-parenthesis bold e minus bold upper Z Superscript negative 1 Baseline bold upper Z Subscript a f f Baseline bold upper S Subscript a f f Baseline bold e plus bold upper V Superscript negative 1 Baseline bold upper V Subscript a f f Baseline bold upper W Subscript a f f Baseline bold e right-parenthesis 2nd Row 1st Column bold upper Delta t Subscript c 2nd Column equals left-parenthesis bold upper A bold upper Theta Superscript negative 1 Baseline bold upper A prime right-parenthesis Superscript negative 1 Baseline bold upper A bold-italic rho 3rd Row 1st Column bold upper Delta z Subscript c 2nd Column equals bold upper Theta Superscript negative 1 Baseline bold upper A prime bold upper Delta t Subscript c Baseline minus bold-italic rho 4th Row 1st Column bold upper Delta v Subscript c 2nd Column equals minus bold upper Delta z Subscript c Baseline 5th Row 1st Column bold upper Delta w Subscript c 2nd Column equals sigma mu bold upper V Superscript negative 1 Baseline bold e minus bold upper V Superscript negative 1 Baseline bold upper V Subscript a f f Baseline bold upper W Subscript a f f Baseline bold e minus bold upper V Superscript negative 1 Baseline bold upper W Subscript a f f Baseline bold upper Delta v Subscript c Baseline 6th Row 1st Column bold upper Delta s Subscript c 2nd Column equals sigma mu bold upper Z Superscript negative 1 Baseline bold e minus bold upper Z Superscript negative 1 Baseline bold upper Z Subscript a f f Baseline bold upper S Subscript a f f Baseline bold e minus bold upper Z Superscript negative 1 Baseline bold upper S Subscript a f f Baseline bold upper Delta z Subscript c EndLayout

Then

where, as before, is the step length, which is assigned a value as large as possible but not so large that a , , , or is "too close" to 0.

Although the predictor-corrector variant entails solving two linear systems instead of one, fewer iterations are usually required to reach the optimum. The additional overhead of the second linear system is small because the matrix has already been factorized in order to solve the first linear system.

You can specify the starting point in the INEST= option in the PROC QUANTREG statement. By default, the starting point is set to be the least squares estimate.

Efficient Interior Point Algorithm

The ALGORITHM=IPM option implements a more efficient interior point algorithm than the one that is used when ALGORITHM=INTERIOR. The computing strategy of the ALGORITHM=IPM option is the same as the strategy for the ALGORITHM=INTERIOR option, but the ALGORITHM=IPM option implements the algorithm by using more efficient matrix functions. The ALGORITHM=IPM option uses the complementarity value to measure the convergence of the algorithm, which is different from the dual gap value that is used when ALGORITHM=INTERIOR. The complementarity value is defined as . You can specify a tolerance for this complementarity convergence criterion by using the TOLERANCE= option in the PROC QUANTREG statement. Unlike the ALGORITHM=INTERIOR option, the ALGORITHM=IPM option does not support the KAPPA= option.

Smoothing Algorithm

To minimize the sum of the absolute residuals , the smoothing algorithm approximates the nondifferentiable function by the following smooth function(which is referred to as the Huber function),

upper D Subscript gamma Baseline left-parenthesis bold-italic beta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts upper H Subscript gamma Baseline left-parenthesis r Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis

where

upper H Subscript gamma Baseline left-parenthesis t right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column t squared slash left-parenthesis 2 gamma right-parenthesis 2nd Column if StartAbsoluteValue t EndAbsoluteValue less-than-or-equal-to gamma 2nd Row 1st Column StartAbsoluteValue t EndAbsoluteValue minus gamma slash 2 2nd Column if StartAbsoluteValue t EndAbsoluteValue greater-than gamma EndLayout

Here , and the threshold is a positive real number. The function is continuously differentiable, and a minimizer of is close to a minimizer of when is close to 0.

The advantage of the smoothing algorithm as described in Madsen and Nielsen (1993) is that the solution can be detected when is small. In other words, it is not necessary to let converge to 0 in order to find a minimizer of . The algorithm terminates before going through the entire sequence of values of that are generated by the algorithm. Convergence is indicated by no change of the status of residuals as goes through this sequence.

The smoothing algorithm extends naturally from regression to general quantile regression (Chen 2007). The function

upper D Subscript rho Sub Subscript tau Baseline left-parenthesis bold-italic beta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts rho Subscript tau Baseline left-parenthesis y Subscript i Baseline minus bold x prime Subscript i Baseline bold-italic beta right-parenthesis

can be approximated by the smooth function

upper D Subscript gamma comma tau Baseline left-parenthesis bold-italic beta right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts upper H Subscript gamma comma tau Baseline left-parenthesis r Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis

where

upper H Subscript gamma comma tau Baseline left-parenthesis t right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column t left-parenthesis tau minus 1 right-parenthesis minus one-half left-parenthesis tau minus 1 right-parenthesis squared gamma 2nd Column if t less-than-or-equal-to left-parenthesis tau minus 1 right-parenthesis gamma 2nd Row 1st Column StartFraction t squared Over 2 gamma EndFraction 2nd Column if left-parenthesis tau minus 1 right-parenthesis gamma less-than-or-equal-to t less-than-or-equal-to tau gamma 3rd Row 1st Column t tau minus one-half tau squared gamma 2nd Column if t greater-than-or-equal-to tau gamma EndLayout

The function is determined by whether , , or . These inequalities divide into subregions that are separated by the parallel hyperplanes and . The set of all such hyperplanes is denoted by :

upper B Subscript gamma comma tau Baseline equals StartSet bold-italic beta element-of bold upper R Superscript p Baseline vertical-bar there-exists i colon r Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis equals left-parenthesis tau minus 1 right-parenthesis gamma or r Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis equals tau gamma EndSet

Define the sign vector as

s Subscript i Baseline equals s Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column negative 1 2nd Column if r Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis less-than-or-equal-to left-parenthesis tau minus 1 right-parenthesis gamma 2nd Row 1st Column 0 2nd Column if left-parenthesis tau minus 1 right-parenthesis gamma less-than-or-equal-to r Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis less-than-or-equal-to tau gamma 3rd Row 1st Column 1 2nd Column if r Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis greater-than-or-equal-to tau gamma EndLayout

and introduce

w Subscript i Baseline equals w Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis equals 1 minus s Subscript i Superscript 2 Baseline left-parenthesis bold-italic beta right-parenthesis

Therefore,

This equation yields

upper D Subscript gamma comma tau Baseline left-parenthesis bold-italic beta right-parenthesis equals StartFraction 1 Over 2 gamma EndFraction bold r prime bold upper W Subscript gamma comma tau Baseline bold r plus bold v prime left-parenthesis s right-parenthesis bold r plus c left-parenthesis s right-parenthesis

where is the diagonal matrix with diagonal elements , , , and .

The gradient of is given by

upper D Subscript gamma comma tau Superscript left-parenthesis 1 right-parenthesis Baseline left-parenthesis bold-italic beta right-parenthesis equals minus bold upper A left-bracket StartFraction 1 Over gamma EndFraction bold upper W Subscript gamma comma tau Baseline left-parenthesis bold-italic beta right-parenthesis r left-parenthesis bold-italic beta right-parenthesis plus v left-parenthesis s right-parenthesis right-bracket

For the Hessian exists and is given by

upper D Subscript gamma comma tau Superscript left-parenthesis 2 right-parenthesis Baseline left-parenthesis bold-italic beta right-parenthesis equals StartFraction 1 Over gamma EndFraction bold upper A bold upper W Subscript gamma comma tau Baseline left-parenthesis bold-italic beta right-parenthesis bold upper A prime

The gradient is a continuous function in , whereas the Hessian is piecewise constant.

Following Madsen and Nielsen (1993), the vector is referred to as a -feasible sign vector if there exists with . If is -feasible, then is defined as the quadratic function that is derived from by substituting for . Thus, for any with ,

upper Q Subscript s Baseline left-parenthesis bold-italic alpha right-parenthesis equals one-half left-parenthesis bold-italic alpha minus bold-italic beta right-parenthesis prime upper D Subscript gamma comma tau Superscript left-parenthesis 2 right-parenthesis Baseline left-parenthesis bold-italic beta right-parenthesis left-parenthesis bold-italic alpha minus bold-italic beta right-parenthesis plus upper D Subscript gamma comma tau Superscript left-parenthesis 1 right-parenthesis Baseline left-parenthesis bold-italic beta right-parenthesis left-parenthesis bold-italic alpha minus bold-italic beta right-parenthesis plus upper D Subscript gamma comma tau Baseline left-parenthesis bold-italic beta right-parenthesis

In the domain ,

upper D Subscript gamma comma tau Baseline left-parenthesis alpha right-parenthesis equals upper Q Subscript s Baseline left-parenthesis bold-italic alpha right-parenthesis

For each and , there can be one or several corresponding quadratics, . If , then is characterized by and . However, for , the quadratic is not unique. Therefore, the following reference determines the quadratic:

left-parenthesis gamma comma theta comma bold s right-parenthesis

Again following Madsen and Nielsen (1993), let be a feasible reference if is a -feasible sign vector, where , and let be a solution reference if is feasible and minimizes .

The smoothing algorithm for minimizing is based on minimizing for a set of decreasing . For each new value of , information from the previous solution is used. Finally, when is small enough, a solution can be found by the following modified Newton-Raphson algorithm as stated by Madsen and Nielsen (1993):

Find an initial solution reference .
Repeat the following substeps until .
1. Decrease .
2. Find a solution reference .
is the solution.

By default, the initial solution reference is found by letting be the least squares solution. Alternatively, you can specify the initial solution reference with the INEST= option in the PROC QUANTREG statement. Then and are chosen according to these initial values.

There are several approaches for determining a decreasing sequence of values of . The QUANTREG procedure uses a strategy by Madsen and Nielsen (1993). The computation that is uses is not significant compared to the Newton-Raphson step. You can control the ratio of consecutive decreasing values of by specifying the RRATIO= suboption in the ALGORITHM= option in the PROC QUANTREG statement. By default,

RRATIO equals StartLayout Enlarged left-brace 1st Row 1st Column 0.1 2nd Column if n greater-than-or-equal-to 10,000 and p less-than-or-equal-to 20 2nd Row 1st Column 0.9 2nd Column if StartFraction p Over n EndFraction greater-than-or-equal-to 0.1 or StartSet n less-than-or-equal-to 5,000 and p greater-than-or-equal-to 300 EndSet 3rd Row 1st Column 0.5 2nd Column otherwise EndLayout

For the and quantile regression, it turns out that the smoothing algorithm is very efficient and competitive, especially for a fat data set—namely, when and is dense. See Chen (2007) for a complete smoothing algorithm and details.

Fast Quantile Process Regression

The QUANTILE=FQPR option in the MODEL statement implements a fast quantile process regression (FQPR) method. This method can efficiently fit multiple quantile regression models by using the divide-and-conquer strategy proposed by Yao (2017).

The FQPR method begins by fitting a quantile regression model for a selected quantile level in a specified quantile-level grid of q-nodes . The quantile level, , is selected as the closest to 0.5 among all the quantile levels in the grid. Using this fit, FQPR defines two subsets of the data based on whether observed values y are above or below their linear predictors in this regression fit. Then FQPR proceeds to recursively perform separate quantile process regressions on the two subsets.

The successive quantile regression steps of FQPR are thus fit to smaller and smaller data sets. It is this sequence of reductions in problem size that provides the very significant reduction in computational cost that FQPR can achieve. In particular, FQPR can fit a quantile process regression model for q equally spaced quantiles in the time that it would approximately take to fit just quantile regression models to all the data.

By default, the QUANTILE=FQPR option uses the ALGORITHM=IPM option in the PROC QUANTREG statement to specify the efficient interior point algorithm for fitting the single-level models in the quantile process model. For more information about the ALGORITHM=IPM option, see the section Efficient Interior Point Algorithm. You can also use the ALGORITHM=SMOOTH option in the PROC QUANTREG statement to specify a fast smoothing algorithm for fitting the single-level models in the quantile process model. The fast smoothing algorithm uses a smooth function different from the smoothing algorithm that is described in the section Smoothing Algorithm. The fast algorithm smooth function is defined as

where

The QUANTILE=FQPR(USEALLOBS) option requests that the FQPR algorithm use all the observations for fitting each of the single-level models in the quantile process model. Because of possible crossing, the USEALLOBS suboption can output a quantile-process model slightly different from the fitted FQPR model that does not use the USEALLOBS suboption.

Last updated: December 09, 2022