The GLMSELECT Procedure

Criteria Used in Model Selection Methods

PROC GLMSELECT supports a variety of fit statistics that you can specify as criteria for the CHOOSE=, SELECT=, and STOP= options in the MODEL statement. The following statistics are available:

ADJRSQ

adjusted R-square statistic (Darlington 1968; Judge et al. 1985)

AIC

Akaike’s information criterion (Darlington 1968; Judge et al. 1985)

AICC

corrected Akaike’s information criterion (Hurvich and Tsai 1989)

BIC

Sawa Bayesian information criterion (Sawa 1978; Judge et al. 1985)

CP

Mallows’ upper C Subscript p statistic (Mallows 1973; Hocking 1976)

PRESS

predicted residual sum of squares statistic

SBC

Schwarz Bayesian information criterion (Schwarz 1978; Judge et al. 1985)

SL

significance level of the F statistic used to assess an effect’s contribution to the fit when it is added to or removed from a model

VALIDATE

average square error over the validation data

Table 9 provides formulas and definitions for the fit statistics.

Table 9: Formulas and Definitions for Model Fit Summary Statistics

Statistic Definition or Formula
n Number of observations
p Number of parameters including the intercept
ModifyingAbove sigma With caret squared Estimate of pure error variance from fitting the full model
SST Total sum of squares corrected for the mean for the
dependent variable
SSE Error sum of squares
ASE StartFraction SSE Over n EndFraction
MSE StartFraction SSE Over n minus p EndFraction
upper R squared 1 minus StartFraction SSE Over SST EndFraction
ADJRSQ 1 minus StartFraction left-parenthesis n minus 1 right-parenthesis left-parenthesis 1 minus upper R squared right-parenthesis Over n minus p EndFraction
AIC n log left-parenthesis StartFraction SSE Over n EndFraction right-parenthesis plus 2 p plus n plus 2
AICC n log left-parenthesis StartFraction SSE Over n EndFraction right-parenthesis plus StartFraction n left-parenthesis n plus p right-parenthesis Over n minus p minus 2 EndFraction
BIC n log left-parenthesis StartFraction SSE Over n EndFraction right-parenthesis plus 2 left-parenthesis p plus 2 right-parenthesis q minus 2 q squared where q equals StartFraction n ModifyingAbove sigma With caret squared Over SSE EndFraction
CP left-parenthesis upper C Subscript p Baseline right-parenthesis StartFraction SSE Over ModifyingAbove sigma With caret squared EndFraction plus 2 p minus n
PRESS sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction r Subscript i Superscript 2 Baseline Over left-parenthesis 1 minus h Subscript i Baseline right-parenthesis squared EndFraction where
r Subscript i Baseline equals residual at observation i and
h Subscript i Baseline equals leverage of observation i equals bold x Subscript i Baseline left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold x prime Subscript i
RMSE StartRoot MSE EndRoot
SBC n log left-parenthesis StartFraction SSE Over n EndFraction right-parenthesis plus p log left-parenthesis n right-parenthesis


Formulas for AIC and AICC

In the context of linear regression, several different versions of the formulas for AIC and AICC appear in the statistics literature. However, for a fixed number of observations, these different versions differ by additive and positive multiplicative constants.

PROC GLMSLECT now uses the definitions of AIC and AICC found in Hurvich and Tsai (1989):

AIC equals n log left-parenthesis StartFraction SSE Over n EndFraction right-parenthesis plus 2 p plus n plus 2

and

AICC equals AIC plus StartFraction 2 left-parenthesis p plus 1 right-parenthesis left-parenthesis p plus 2 right-parenthesis Over n minus p minus 2 EndFraction

Hurvich and Tsai (1989) show that the formula for AICC can also be written as

AICC equals n log left-parenthesis StartFraction SSE Over n EndFraction right-parenthesis plus StartFraction n left-parenthesis n plus p right-parenthesis Over n minus p minus 2 EndFraction
Last updated: December 09, 2022