The GLIMMIX Procedure

From Penalized Splines to Mixed Models

The connection between splines and mixed models arises from the similarity of the penalized spline fitting criterion to the minimization problem that yields the mixed model equations and solutions for and . This connection is made explicit in the following paragraphs. An important distinction between classical spline fitting and its mixed model smoothing variant, however, lies in the nature of the spline coefficients. Although they address similar minimization criteria, the solutions for the spline coefficients in the GLIMMIX procedure are the solutions of random effects, not fixed effects. Standard errors of predicted values, for example, account for this source of variation.

Consider the linearized mixed pseudo-model from the section The Pseudo-model, . One derivation of the mixed model equations, whose solutions are and , is to maximize the joint density of with respect to and . This is not a true likelihood problem, because is not a parameter, but a random vector.

In the special case with and , the maximization of is equivalent to the minimization of

upper Q left-parenthesis bold-italic beta comma bold-italic gamma right-parenthesis equals phi Superscript negative 1 Baseline left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis prime left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis plus sigma Superscript negative 2 Baseline bold-italic gamma prime bold-italic gamma

Now consider a linear spline as in Ruppert, Wand, and Carroll (2003, p. 108),

p Subscript i Baseline equals beta 0 plus beta 1 x Subscript i Baseline plus sigma-summation Underscript j equals 1 Overscript upper K Endscripts gamma Subscript j Baseline left-parenthesis x Subscript i Baseline minus t Subscript j Baseline right-parenthesis Subscript plus

where the denote the spline coefficients at knots . The truncated line function is defined as

left-parenthesis x minus t right-parenthesis Subscript plus Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column x minus t 2nd Column x greater-than t 2nd Row 1st Column 0 2nd Column normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout

If you collect the intercept and regressor x into the matrix , and if you collect the truncated line functions into the matrix , then fitting the linear spline amounts to minimization of the penalized spline criterion

upper Q Superscript asterisk Baseline left-parenthesis bold-italic beta comma bold-italic gamma right-parenthesis equals left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis prime left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis plus lamda squared bold-italic gamma prime bold-italic gamma

where is the smoothing parameter.

Because minimizing with respect to and is equivalent to minimizing , both problems lead to the same solution, and is the smoothing parameter. The mixed model formulation of spline smoothing has the advantage that the smoothing parameter is selected "automatically." It is a function of the covariance parameter estimates, which, in turn, are estimated according to the method you specify with the METHOD= option in the PROC GLIMMIX statement.

To accommodate nonnormal responses and general link functions, the GLIMMIX procedure uses , where is the matrix of variance functions and is the diagonal matrix of mean derivatives defined earlier. The correspondence between spline smoothing and mixed modeling is then one between a weighted linear mixed model and a weighted spline. In other words, the minimization criterion that yields the estimates and solutions is then

If you choose the TYPE=RSMOOTH covariance structure, PROC GLIMMIX chooses radial basis functions as the spline basis and transforms them to approximate a thin-plate spline as in Chapter 13.4 of Ruppert, Wand, and Carroll (2003). For computational expediency, the number of knots is chosen to be less than the number of data points. Ruppert, Wand, and Carroll (2003) recommend one knot per every four unique regressor values for one-dimensional smoothers. In the multivariate case, general recommendations are more difficult, because the optimal number and placement of knots depend on the spatial configuration of samples. Their recommendation for a bivariate smoother is one knot per four samples, but at least 20 and no more than 150 knots (Ruppert, Wand, and Carroll 2003, p. 257).

The magnitude of the variance component depends on the metric of the random effects. For example, if you apply radial smoothing in time, the variance changes if you measure time in days or minutes. If the solution for the variance component is near zero, then a rescaling of the random effect data can help the optimization problem by moving the solution for the variance component away from the boundary of the parameter space.

Last updated: December 09, 2022