The ROBUSTREG Procedure

Leverage-Point and Outlier Detection

The regular variable LEVERAGE is defined as

LEVERAGE equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if normal upper R normal upper D left-parenthesis bold x Subscript i Baseline right-parenthesis less-than-or-equal-to upper C left-parenthesis p right-parenthesis 2nd Row 1st Column 1 2nd Column otherwise EndLayout

where upper C left-parenthesis p right-parenthesis equals StartRoot chi Subscript p semicolon 1 minus alpha Superscript 2 Baseline EndRoot is the cutoff value. upper C left-parenthesis p right-parenthesis can be set by using the leverage CUTOFF= option, and alpha can be set by using the leverage CUTOFFALPHA= option.

If projected robust distances are computed for a data set that has a low-dimensional structure, the default cutoff value is upper C left-parenthesis q right-parenthesis equals StartRoot chi Subscript q semicolon 1 minus alpha Superscript 2 Baseline EndRoot, where q is the dimensionality of the low-dimensional space. LEVERAGE is then defined as

LEVERAGE equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if POD left-parenthesis bold x Subscript i Baseline right-parenthesis equals 0 and PRD left-parenthesis bold x Subscript i Baseline right-parenthesis less-than-or-equal-to upper C left-parenthesis q right-parenthesis 2nd Row 1st Column 1 2nd Column if POD left-parenthesis bold x Subscript i Baseline right-parenthesis equals 0 and PRD left-parenthesis bold x Subscript i Baseline right-parenthesis greater-than upper C left-parenthesis q right-parenthesis left-parenthesis called in hyphen plane leverage right-parenthesis 3rd Row 1st Column 1 2nd Column if POD left-parenthesis bold x Subscript i Baseline right-parenthesis greater-than 0 left-parenthesis called off hyphen plane leverage right-parenthesis EndLayout

where POD is the projected off-plane distance and PRD denotes the projected robust distance. You can specify a cutoff value by using the CUTOFF= or CUTOFFALPHA= suboption of the LEVERAGE option in the MODEL statement.

Residuals r Subscript i Baseline comma i equals 1 comma ellipsis comma n, based on robust regression estimates are used to detect vertical outliers. The variable OUTLIER is defined as

OUTLIER equals StartLayout Enlarged left-brace 1st Row 1st Column 0 2nd Column if StartAbsoluteValue r Subscript i Baseline EndAbsoluteValue less-than-or-equal-to k ModifyingAbove sigma With caret 2nd Row 1st Column 1 2nd Column otherwise EndLayout

where ModifyingAbove sigma With caret is the estimated scale in the model and the multiplier k of the cutoff value is specified by the CUTOFF= option in the MODEL statement. By default, k = 3.

An ODS table called Diagnostics contains the LEVERAGE and OUTLIER variables.

Last updated: March 08, 2022