The REG Procedure

Computations for Ridge Regression and IPC Analysis

In ridge regression analysis, the crossproduct matrix for the independent variables is centered (the NOINT option is ignored if it is specified) and scaled to one on the diagonal elements. The ridge constant k (specified with the RIDGE= option) is then added to each diagonal element of the crossproduct matrix. The ridge regression estimates are the least squares estimates obtained by using the new crossproduct matrix.

Let X be an n times p matrix of the independent variables after centering the data, and let Y be an n times 1 vector corresponding to the dependent variable. Let D be a p times p diagonal matrix with diagonal elements as in bold upper X prime bold upper X. The ridge regression estimate corresponding to the ridge constant k can be computed as

bold upper D Superscript negative one-half Baseline left-parenthesis bold upper Z prime bold upper Z plus k bold upper I Subscript p Baseline right-parenthesis Superscript negative 1 Baseline bold upper Z prime bold upper Y

where bold upper Z equals bold upper X bold upper D Superscript negative one-half and bold upper I Subscript p Baseline is a p times p identity matrix.

For IPC analysis, the smallest m eigenvalues of bold upper Z prime bold upper Z (where m is specified with the PCOMIT= option) are omitted to form the estimates.

For information about ridge regression and IPC standardized parameter estimates, parameter estimate standard errors, and variance inflation factors, see Rawlings, Pantula, and Dickey (1998); Neter, Wasserman, and Kutner (1990); Marquardt and Snee (1975). Unlike Rawlings, Pantula, and Dickey (1998), the REG procedure uses the mean squared errors of the submodels instead of the full model MSE to compute the standard errors of the parameter estimates.

Last updated: December 09, 2022