The GLMSELECT Procedure

Adaptive LASSO Selection

Adaptive LASSO selection is a modification of LASSO selection; in adaptive LASSO selection, weights are applied to each of the parameters in forming the LASSO constraint (Zou 2006). More precisely, suppose that the response bold y has mean zero and the regressors bold x are scaled to have mean zero and common standard deviation. Furthermore, suppose you can find a suitable estimator ModifyingAbove bold-italic beta With caret of the parameters in the true model and you define a weight vector by w equals 1 slash StartAbsoluteValue ModifyingAbove bold-italic beta With caret EndAbsoluteValue Superscript gamma, where gamma greater-than-or-equal-to 0. Then the adaptive LASSO regression coefficients bold-italic beta equals left-parenthesis beta 1 comma beta 2 comma ellipsis comma beta Subscript m Baseline right-parenthesis are the solution to the constrained optimization problem

min StartAbsoluteValue EndAbsoluteValue bold y minus bold upper X bold-italic beta StartAbsoluteValue EndAbsoluteValue squared subject to sigma-summation Underscript j equals 1 Overscript m Endscripts StartAbsoluteValue w Subscript j Baseline beta Subscript j Baseline EndAbsoluteValue less-than-or-equal-to t

You can specify ModifyingAbove bold-italic beta With caret by using the INEST=suboption of the SELECTION=LASSO option in the MODEL statement. The INEST= data set has the same structure as the OUTEST= data set that is produced by several SAS/STAT procedures, including the REG and LOGISTIC procedures. The INEST= data set must contain all explanatory variables in the MODEL statement. It must also contain an intercept variable named Intercept unless you specify the NOINT option in the MODEL statement. If BY processing is used, the INEST= data set must also include the BY variables, and there must be one observation for each BY group. If the INEST= data set also contains the _TYPE_ variable, only observations whose _TYPE_ value is PARMS are used.

If you do not specify an INEST= data set, then PROC GLMSELECT uses the solution to the unconstrained least squares problem as the estimator ModifyingAbove bold-italic beta With caret. This is appropriate unless collinearity is a concern. If the regressors are collinear or nearly collinear, then Zou (2006) suggests using a ridge regression estimate to form the adaptive weights.

Last updated: December 09, 2022