The TRANSREG Procedure

Penalized B-Splines

You can use penalized B-splines (Eilers and Marx 1996) to fit a smooth curve through a scatter plot with an automatic selection of the smoothing parameter. See Example 126.3 for an example. With penalized B-splines, you can find a transformation that minimizes any of the following criteria: CV, GCV, AIC, AICC, or SBC. These criteria are all functions of lamda. For many problems, all of these criteria produce nearly identical results. However, for some problems, the choice of criterion can have a large effect. When the default results are not satisfactory, try the other criteria. Information criteria such as AIC and AICC are defined in different ways in the statistical literature, and these differences can be seen in different SAS procedures. Typically, the definitions differ only by a positive (additive or multiplicative) constant, so they are equivalent, and each of the definitions of the same criterion produces the same selection of lamda. The definitions that PROC TRANSREG uses match the definitions that PROC REG uses. The penalized B-spline matrices, statistics, and criteria are defined as follows:

n number of observations
bold y dependent variable
bold upper W diagonal matrix of observation weights
w Subscript i weight for the ith observation
bold upper B B-spline basis for the independent variable
lamda nonnegative smoothing parameter
bold upper D difference matrix, penalizes lack of smoothness
bold upper H equals bold upper B left-parenthesis bold upper B prime bold upper W bold upper B plus lamda bold upper D prime bold upper D right-parenthesis Superscript negative 1 Baseline bold upper B prime bold upper W hat matrix
h Subscript i i ith diagonal element of bold upper H
ModifyingAbove bold y With caret equals bold upper H bold y penalized B-spline transformation of bold y
SSE equals sigma-summation Underscript i equals 1 Overscript n Endscripts w Subscript i Baseline left-parenthesis y Subscript i Baseline minus ModifyingAbove y With caret Subscript i Baseline right-parenthesis squared error sum of squares
t equals sigma-summation Underscript i equals 1 Overscript n Endscripts h Subscript i i trace of bold upper H
sigma-summation Underscript i equals 1 Overscript n Endscripts w Subscript i Baseline left-parenthesis StartFraction y Subscript i Baseline minus ModifyingAbove y With caret Subscript i Baseline Over 1 minus h Subscript i i Baseline EndFraction right-parenthesis squared CV - cross validation criterion
sigma-summation Underscript i equals 1 Overscript n Endscripts w Subscript i Baseline left-parenthesis StartFraction y Subscript i Baseline minus ModifyingAbove y With caret Subscript i Baseline Over n minus t EndFraction right-parenthesis squared GCV - generalized cross validation criterion
n log left-parenthesis SSE slash n right-parenthesis plus 2 t AIC - Akaike’s information criterion
1 plus log left-parenthesis SSE slash n right-parenthesis plus StartFraction 2 left-parenthesis t plus 1 right-parenthesis Over n minus t minus 2 EndFraction AICC - corrected AIC (default)
n log left-parenthesis SSE slash n right-parenthesis plus t log left-parenthesis n right-parenthesis SBC - Schwarz’s Bayesian criterion

For more information about constructing the B-spline basis, see SPLINE and MSPLINE Transformations and the section Using Splines and Knots. The nonzero elements of bold upper D, order 1 are (1 –1), order 2 are (1 –2 1), order 3 (the default) are (1 –3 3 –1), order 4 are (1 –4 6 –4 1), and so on. The nonzero elements for each order are made from the nonzero elements from the preceding order by subtraction: bold d prime Subscript i plus 1 Baseline equals left-parenthesis bold d prime Subscript i Baseline 0 right-parenthesis minus left-parenthesis 0 bold d prime Subscript i right-parenthesis. Within an order, the first nonzero element of row i is in column i—that is, each row of bold upper D is made from the preceding row by shifting the nonzero elements to the right one position. For example, with k = 4 knots, order o = 3, and degree d = 3, bold upper D is the left-parenthesis left-parenthesis d plus 1 plus k minus o right-parenthesis times left-parenthesis d plus 1 plus k right-parenthesis right-parenthesis matrix:

Start 5 By 8 Matrix 1st Row 1st Column 1 2nd Column negative 3 3rd Column 3 4th Column negative 1 5th Column 0 6th Column 0 7th Column 0 8th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column negative 3 4th Column 3 5th Column negative 1 6th Column 0 7th Column 0 8th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column negative 3 5th Column 3 6th Column negative 1 7th Column 0 8th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 1 5th Column negative 3 6th Column 3 7th Column negative 1 8th Column 0 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 1 6th Column negative 3 7th Column 3 8th Column negative 1 EndMatrix

where
1 –1 0
0 1 –1
1 –2 1 0
0 1 –2 1
1 –3 3 –1

The trace of the hat matrix, t equals sigma-summation Underscript i equals 1 Overscript n Endscripts h Subscript i i, provides an estimate of the number of parameters needed to find the transformation and is used in df calculations. Note, however, that in some cases, particularly with error-free or nearly error-free data, this value can be much larger than you might expect. You might be able to directly create a function by using SPLINE or BSPLINE with many fewer parameters that fits essentially just as well as the penalized B-spline function.

By default with PBSPLINE, a cubic spline is fit with 100 evenly spaced knots, three evenly spaced exterior knots, and a difference matrix of order three. Options are specified as follows: PBSPLINE(x / DEGREE=3 NKNOTS=100 EVENLY=3 PARAMETER=3). By default, PROC TRANSREG searches for an optimal lambda in the range 0 to 1E6 by using parabolic interpolation and Brent’s (Brent 1973; Press et al. 1989) method. Alternatively, you can specify a lambda range or a list of lambdas by using the LAMBDA= option. Be aware, however, LAMBDA=0 and values near zero might cause numerical problems including floating point errors. Also be aware that larger lambdas might cause numerical problems—for example, the error sum of squares for the model, normal upper Sigma left-parenthesis y minus ModifyingAbove y With caret right-parenthesis squared, might be greater than the total sum of squares, normal upper Sigma left-parenthesis y minus y overbar right-parenthesis squared—implying that the model with the transformation fits less well than simply predicting by using the mean. When this happens, you will see this message: ERROR: Degenerate transformation with PBSPLINE.

You can fit a single curve through a scatter plot (y times x) as follows:

model identity(y) = pbspline(x);

Alternatively, you can fit multiple curves through a scatter plot, one for each level of Group, as follows:

model identity(y) = class(group / zero=none) * pbspline(x);

There are several options for how the smoothing parameter, lamda, is chosen. Usually, you do not specify the smoothing parameter, lamda, and you let PROC TRANSREG choose lamda for you by minimizing one of the information or cross validation criteria. By default, PROC TRANSREG first considers ranges defined by lamda equals 0 and lamda equals 1 comma 10 comma 100 comma 1,000 comma 10,000 comma 100,000 comma 1,000,000. If it finds a range that includes the minimum, it stops and does not consider larger lamda values. Then it performs further searches in that range. For example, if the initial evaluations at lamda equals 1 and lamda equals 10 show that there is at least a local minimum in the range 0 to 10, then larger values are not considered. Note that the zero smoothing case, lamda equals 0, provides a boundary on the range even though the criterion is not evaluated at lamda equals 0. The criterion is not evaluated at lamda equals 0 unless LAMBDA=0 is the only value specified. Also note that the default approach is not the same as specifying the options LAMBDA=0 1E6 RANGE. When a range of values is specified, along with the RANGE t-option, PROC TRANSREG does not try to find smaller ranges based on powers of 10.

PROC TRANSREG avoids evaluating the criterion for LAMBDA= values at or near zero unless you force it to consider them. This is because zero smoothing is rarely interesting and the results are numerically unstable. Values of lamda at or near zero often result in predicted values that are far outside the range of the data, particularly with interpolation and x values that do not appear in the data set. Also, zero smoothing is prone to numerical problems including floating point errors. This is particularly true when there is a small number of observations, a large number of knots, a high degree, or a perfect or near perfect fit. If you force PROC TRANSREG to evaluate the criterion at or near lamda equals 0, you can easily get bad results.

Note that when some observations appear more than once, such as when you have the kind of data where you can use a FREQ statement, then you should consider directly specifying lambda based on a preliminary analysis, ignoring the frequencies. Alternatively, specify a range of lamda values, such as LAMBDA=0.1 1E6 RANGE, that steers lamda away from values near zero. With the default lambda list, a cross validation criterion does not perform well in choosing a smoothing parameter with replicated data. Leaving one observation out of the computations changes the frequency for that observation from one positive integer to the next smaller positive integer, so in some sense, the point corresponding to that observation is never really left out of any computations. The resulting fit will be undersmoothed unless you specify a larger lamda.

Last updated: December 09, 2022