The QUANTREG Procedure

Leverage Point and Outlier Detection

The QUANTREG procedure uses robust multivariate location and scale estimates for leverage-point detection.

Mahalanobis distance is defined as

normal upper M normal upper D left-parenthesis bold x Subscript i Baseline right-parenthesis equals left-bracket left-parenthesis bold x Subscript i Baseline minus bold x overbar right-parenthesis prime bold upper C overbar left-parenthesis bold upper A right-parenthesis Superscript negative 1 Baseline left-parenthesis bold x Subscript i Baseline minus bold x overbar right-parenthesis right-bracket Superscript 1 slash 2

where bold x overbar equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts bold x Subscript i and ModifyingAbove bold upper C With bar left-parenthesis bold upper A right-parenthesis equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis bold x Subscript i Baseline minus bold x overbar right-parenthesis prime left-parenthesis bold x Subscript i Baseline minus bold x overbar right-parenthesis are the empirical multivariate location and scale, respectively. Here, bold x Subscript i Baseline equals left-parenthesis bold x Subscript i Baseline 1 Baseline comma ellipsis comma bold x Subscript i left-parenthesis p minus 1 right-parenthesis Baseline right-parenthesis prime does not include the intercept variable. The relationship between the Mahalanobis distance normal upper M normal upper D left-parenthesis bold x Subscript i Baseline right-parenthesis and the matrix bold upper H equals left-parenthesis h Subscript i j Baseline right-parenthesis equals bold upper A prime left-parenthesis bold upper A bold upper A prime right-parenthesis Superscript negative 1 Baseline bold upper A is

h Subscript i i Baseline equals StartFraction 1 Over n minus 1 EndFraction normal upper M normal upper D Subscript i Superscript 2 Baseline plus StartFraction 1 Over n EndFraction

Robust distance is defined as

normal upper R normal upper D left-parenthesis bold x Subscript i Baseline right-parenthesis equals left-bracket left-parenthesis bold x Subscript i Baseline minus bold upper T left-parenthesis bold upper A right-parenthesis right-parenthesis prime bold upper C left-parenthesis bold upper A right-parenthesis Superscript negative 1 Baseline left-parenthesis bold x Subscript i Baseline minus bold upper T left-parenthesis bold upper A right-parenthesis right-parenthesis right-bracket Superscript 1 slash 2

where bold upper T left-parenthesis bold upper A right-parenthesis and bold upper C left-parenthesis bold upper A right-parenthesis are robust multivariate location and scale estimates that are computed according to the minimum covariance determinant (MCD) method of Rousseeuw and Van Driessen (1999).

These distances are used to detect leverage points. You can use the LEVERAGE and DIAGNOSTICS options in the MODEL statement to request leverage-point and outlier diagnostics, respectively. Two new variables, Leverage and Outlier, respectively, are created and saved in an output data set that is specified in the OUTPUT statement.

Let upper C left-parenthesis p right-parenthesis equals StartRoot chi Subscript p semicolon 1 minus alpha Superscript 2 Baseline EndRoot be the cutoff value. The variable LEVERAGE is defined as

LEVERAGE equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if normal upper R normal upper D left-parenthesis bold x Subscript i Baseline right-parenthesis less-than-or-equal-to upper C left-parenthesis p right-parenthesis 2nd Row 1st Column 1 2nd Column otherwise EndLayout

You can specify a cutoff value in the LEVERAGE option in the MODEL statement.

Residuals r Subscript i Baseline comma i equals 1 comma ellipsis comma n, that are based on quantile regression estimates are used to detect vertical outliers. The variable OUTLIER is defined as

OUTLIER equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if StartAbsoluteValue r Subscript i Baseline EndAbsoluteValue less-than-or-equal-to k sigma 2nd Row 1st Column 1 2nd Column otherwise EndLayout

You can specify the multiplier k of the cutoff value in the CUTOFF= option in the MODEL statement. You can specify the scale sigma in the SCALE= option in the MODEL statement. By default, k = 3 and the scale sigma is computed as the corrected median of the absolute residuals:

sigma equals median StartSet StartAbsoluteValue r Subscript i Baseline EndAbsoluteValue slash beta 0 comma i equals 1 comma ellipsis comma n EndSet

where beta 0 equals normal upper Phi Superscript negative 1 Baseline left-parenthesis 0.75 right-parenthesis is an adjustment constant for consistency when the normal distribution is used.

An ODS table called DIAGNOSTICS contains the Leverage and Outlier variables.

Last updated: December 09, 2022