Introduction to Regression Procedures

Generalized Additive Models: The GAM and GAMPL Procedures

Generalized additive models are nonparametric models in which one or more regressor variables are present and can make different smooth contributions to the mean function. For example, if bold x Subscript i Baseline equals left-bracket x Subscript i Baseline 1 Baseline comma x Subscript i Baseline 2 Baseline comma ellipsis comma x Subscript i k Baseline right-bracket is a vector of k regressor for the ith observation, then an additive model represents the mean function as

normal upper E left-bracket upper Y right-bracket equals f 0 plus f 1 left-parenthesis x Subscript i Baseline 1 Baseline right-parenthesis plus f 2 left-parenthesis x Subscript i Baseline 2 Baseline right-parenthesis plus midline-horizontal-ellipsis plus f 3 left-parenthesis x Subscript i Baseline 3 Baseline right-parenthesis

The individual functions f Subscript j can have a parametric or nonparametric form. If all f Subscript j are parametric, the additive model is fully parametric. If some f Subscript j are nonparametric, the additive model is semiparametric. Otherwise, the additive model is fully nonparametric.

The generalization of additive models is akin to the generalization of linear models: nonnormal data are accommodated by explicitly modeling the distribution of the data as a member of the exponential family and by applying a monotonic link function that provides a mapping between the predictor and the mean of the data.

PROC GAM uses smoothing splines or local regression smoothers to represent each individual function f Subscript j. Because of the large number of parameters that are involved, PROC GAM uses the backfitting algorithm (Hastie and Tibshirani 1990) to fit f Subscript j by using partial residuals y minus sigma-summation Underscript i Endscripts ModifyingAbove f With caret Subscript i colon i not-equals j while keeping other fitted terms ModifyingAbove f With caret Subscript i colon i not-equals j temporarily fixed. It cycles over all smoothers until convergence.

PROC GAMPL uses low-rank regression splines (Wood 2003) to represent each individual function f Subscript j. It reduces the number of parameters from the number of unique data points to a number that defines the upper limit of the smoothing complexity. Thus the number of parameters for regression splines is much smaller than the number of parameters for smoothing splines. PROC GAMPL also assumes an individual smoothing penalty for each f Subscript j and then fits all additive models simultaneously.

Because of their different smoothers and general model fitting approaches, PROC GAM and PROC GAMPL do not usually yield similar results. The methods that PROC GAM uses are limited to small data sets. In contrast, you can use PROC GAMPL to efficiently analyze larger data sets.

Last updated: December 09, 2022