The TRANSREG Procedure

SPLINE and MSPLINE Transformations

The missing portions of variables subjected to SPLINE or MSPLINE transformations are handled the same way as for OPSCORE, MONOTONE, UNTIE, and LINEAR transformations (see the previous section). The nonmissing partition is handled by first creating a B-spline basis of the specified degree with the specified knots for the nonmissing partition of the initial scaling vector and then regressing the target onto the basis. The optimally scaled vector is a linear combination of the B-spline basis vectors. Ordinary least squares regression coefficients are used. An algorithm for generating the B-spline basis is given in De Boor (1978, pp. 134–135). B-splines are both a computationally accurate and efficient way of constructing a basis for piecewise polynomials; however, they are not the most natural method of describing splines.

Consider an initial scaling vector x equals left-parenthesis 1 2 3 4 5 6 7 8 9 right-parenthesis prime and a degree-three spline with interior knots at 3.5 and 6.5. The B-spline basis for the transformation is the left matrix, and the natural piecewise polynomial spline basis is the right matrix.

StartLayout 1st Row 1st Column upper B hyphen Spline Basis 2nd Column Piecewise Polynomial Splines 2nd Row 1st Column Start 9 By 6 Matrix 1st Row 1st Column 1.000 2nd Column 0.000 3rd Column 0.000 4th Column 0.000 5th Column 0 6th Column 0 2nd Row 1st Column 0.216 2nd Column 0.608 3rd Column 0.167 4th Column 0.009 5th Column 0 6th Column 0 3rd Row 1st Column 0.008 2nd Column 0.458 3rd Column 0.461 4th Column 0.073 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0.172 3rd Column 0.585 4th Column 0.241 5th Column 0.001 6th Column 0 5th Row 1st Column 0 2nd Column 0.037 3rd Column 0.463 4th Column 0.463 5th Column 0.037 6th Column 0 6th Row 1st Column 0 2nd Column 0.001 3rd Column 0.241 4th Column 0.585 5th Column 0.172 6th Column 0 7th Row 1st Column 0 2nd Column 0 3rd Column 0.073 4th Column 0.461 5th Column 0.458 6th Column 0.008 8th Row 1st Column 0 2nd Column 0 3rd Column 0.009 4th Column 0.167 5th Column 0.608 6th Column 0.216 9th Row 1st Column 0 2nd Column 0 3rd Column 0.000 4th Column 0.000 5th Column 0.000 6th Column 1.000 EndMatrix 2nd Column Start 9 By 6 Matrix 1st Row 1st Column 1 2nd Column 1 3rd Column 1 4th Column 1 5th Column 0 6th Column 0 2nd Row 1st Column 1 2nd Column 2 3rd Column 4 4th Column 8 5th Column 0 6th Column 0 3rd Row 1st Column 1 2nd Column 3 3rd Column 9 4th Column 27 5th Column 0 6th Column 0 4th Row 1st Column 1 2nd Column 4 3rd Column 16 4th Column 64 5th Column 0.125 6th Column 0 5th Row 1st Column 1 2nd Column 5 3rd Column 25 4th Column 125 5th Column 3.375 6th Column 0 6th Row 1st Column 1 2nd Column 6 3rd Column 36 4th Column 216 5th Column 15.625 6th Column 0 7th Row 1st Column 1 2nd Column 7 3rd Column 49 4th Column 343 5th Column 42.875 6th Column 0.125 8th Row 1st Column 1 2nd Column 8 3rd Column 64 4th Column 512 5th Column 91.125 6th Column 3.375 9th Row 1st Column 1 2nd Column 9 3rd Column 81 4th Column 729 5th Column 166.375 6th Column 15.625 EndMatrix EndLayout

The two matrices span the same column space. The natural basis has an intercept, a linear term, a quadratic term, a cubic term, and two more terms since there are two interior knots. These terms are generated (for knot k and bold x element x) by the formula left-parenthesis x minus k right-parenthesis cubed times upper I Subscript left-parenthesis x greater-than k right-parenthesis. The indicator variable upper I Subscript left-parenthesis x greater-than k right-parenthesis evaluates to 1.0 if x is greater than k and to 0.0 otherwise. If knot k had been repeated, there would be a left-parenthesis x minus k right-parenthesis squared times upper I Subscript left-parenthesis x greater-than k right-parenthesis term also. Notice that the fifth column makes no contribution to the curve before 3.5, makes zero contribution at 3.5 (the transformation is continuous), and makes an increasing contribution beyond 3.5. The same pattern of results holds for the last term with knot 6.5. The coefficient of the fifth column represents the change in the cubic portion of the curve after 3.5. The coefficient of the sixth column represents the change in the cubic portion of the curve after 6.5.

The numbers in the B-spline basis do not have a simple interpretation like the numbers in the natural piecewise polynomial basis. The B-spline basis has a diagonally banded structure. The band shifts one column to the right after every knot. The number of entries in each row that can potentially be nonzero is one greater than the degree. The elements within a row always sum to one. The B-spline basis is accurate because of the smallness of the numbers and the lack of extreme collinearity inherent in the natural polynomials. B-splines are efficient because PROC TRANSREG can take advantage of the sparseness of the B-spline basis when it accumulates crossproducts. The number of required multiplications and additions to accumulate the crossproduct matrix does not increase with the number of knots but does increase with the degree of the spline, so it is much more computationally efficient to increase the number of knots than to increase the degree of the polynomial.

MSPLINE transformations are handled like SPLINE transformations except that constraints are placed on the coefficients to ensure monotonicity. When the coefficients of the B-spline basis are monotonically increasing, the transformation is monotonically increasing. When the polynomial degree is two or less, monotone coefficient splines, integrated splines (Winsberg and Ramsay 1980), and the general class of all monotone splines are equivalent.

Last updated: December 09, 2022