Statistical Graphics Using ODS

Understanding Splines and Knots

This section is optional; it shows some of the mathematical details of polynomial-spline models. As shown previously, the following is a cubic-polynomial regression model:

y equals beta 0 plus beta 1 x plus beta 2 x squared plus beta 3 x cubed plus epsilon

If you add n Subscript k knots, it becomes a polynomial-spline regression model:

y equals beta 0 plus beta 1 x plus beta 2 x squared plus beta 3 x cubed plus sigma-summation Underscript i equals 1 Overscript n Subscript k Baseline Endscripts beta Subscript 3 plus i Baseline left-parenthesis x minus k Subscript i Baseline right-parenthesis Subscript plus Superscript 3 Baseline plus epsilon

Polynomial splines are easy to understand and describe. A curve has an overall intercept, linear portion, quadratic portion, and cubic portion. Then the cubic portion changes at each knot. Output 24.6.5 illustrates a spline that has knots at –2, 0, and 2. The blue function, y equals beta 0 plus beta 1 x plus beta 2 x squared plus beta 3 x cubed, extends from –5 to 5. The blue and red function, y equals beta 0 plus beta 1 x plus beta 2 x squared plus beta 3 x cubed plus beta 4 left-parenthesis x minus negative 2 right-parenthesis Subscript plus Superscript 3, extends from –5 to almost 2 as it heads toward y equals normal infinity. The red component first contributes to the overall function when x minus negative 2 is positive. The blue, red, and green function, y equals beta 0 plus beta 1 x plus beta 2 x squared plus beta 3 x cubed plus beta 4 left-parenthesis x minus negative 2 right-parenthesis Subscript plus Superscript 3 Baseline plus beta 5 left-parenthesis x minus 0 right-parenthesis Subscript plus Superscript 3, extends from –5 to just beyond 3 as it heads toward y equals negative normal infinity. The green component first contributes to the overall function when x minus 0 is positive. The blue, red, green, and orange function, y equals beta 0 plus beta 1 x plus beta 2 x squared plus beta 3 x cubed plus beta 4 left-parenthesis x minus negative 2 right-parenthesis Subscript plus Superscript 3 Baseline plus beta 5 left-parenthesis x minus 0 right-parenthesis Subscript plus Superscript 3 Baseline plus beta 6 left-parenthesis x minus 2 right-parenthesis Subscript plus Superscript 3, extends from –5 to almost 5 as it heads toward y equals normal infinity. The orange component first contributes to the overall function when x minus 2 is positive. Thus, y equals beta 0 plus beta 1 x plus beta 2 x squared plus beta 3 x cubed plus beta 4 left-parenthesis x minus negative 2 right-parenthesis Subscript plus Superscript 3 Baseline plus beta 5 left-parenthesis x minus 0 right-parenthesis Subscript plus Superscript 3 Baseline plus beta 6 left-parenthesis x minus 2 right-parenthesis Subscript plus Superscript 3, which is highlighted in yellow, is the spline function, and it is composed of four component functions. The coefficients beta 4, beta 5, and beta 6 are the change in the cubic portion of the spline. The intercept does not change; this makes the spline continuous. The linear and quadratic terms do not change; this makes the spline smooth.

Output 24.6.5: Polynomial-Spline Components

Polynomial-Spline Components


Mathematically, the cubic-polynomial spline is continuous, as are its first and second derivatives. Computationally, cubic-polynomial splines might be problematic, particularly for large data sets or when there are many knots. This is because some terms might be highly correlated, resulting in an unstable model. In practice, B-splines are preferred over cubic-polynomial splines, although the two types of splines are equivalent. If bold upper X Subscript p is a full-rank polynomial-spline basis and bold upper X Subscript upper B is the corresponding full-rank B-spline basis, then there exists a matrix bold upper T such that bold upper X Subscript upper B Baseline equals bold upper X Subscript p Baseline bold upper T and bold upper X Subscript upper B Baseline bold upper T Superscript negative 1 Baseline equals bold upper X Subscript p. For an illustration, see the section B-Spline Basis. The overall fit and R-square are the same, but because the basis columns of the bold upper X matrices are different, the regression coefficients are different. Regression coefficients for B-spline models are usually not interpretable.

Last updated: December 09, 2022