The GLIMMIX Procedure

From Penalized Splines to Mixed Models

The connection between splines and mixed models arises from the similarity of the penalized spline fitting criterion to the minimization problem that yields the mixed model equations and solutions for bold-italic beta and bold-italic gamma. This connection is made explicit in the following paragraphs. An important distinction between classical spline fitting and its mixed model smoothing variant, however, lies in the nature of the spline coefficients. Although they address similar minimization criteria, the solutions for the spline coefficients in the GLIMMIX procedure are the solutions of random effects, not fixed effects. Standard errors of predicted values, for example, account for this source of variation.

Consider the linearized mixed pseudo-model from the section The Pseudo-model, bold upper P equals bold upper X bold-italic beta plus bold upper Z bold-italic gamma plus bold-italic epsilon. One derivation of the mixed model equations, whose solutions are ModifyingAbove bold-italic beta With caret and ModifyingAbove bold-italic gamma With caret, is to maximize the joint density of f left-parenthesis bold-italic gamma comma bold-italic epsilon right-parenthesis with respect to bold-italic beta and bold-italic gamma. This is not a true likelihood problem, because bold-italic gamma is not a parameter, but a random vector.

In the special case with normal upper V normal a normal r left-bracket bold-italic epsilon right-bracket equals phi bold upper I and normal upper V normal a normal r left-bracket bold-italic gamma right-bracket equals sigma squared bold upper I, the maximization of f left-parenthesis bold-italic gamma comma bold-italic epsilon right-parenthesis is equivalent to the minimization of

upper Q left-parenthesis bold-italic beta comma bold-italic gamma right-parenthesis equals phi Superscript negative 1 Baseline left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis prime left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis plus sigma Superscript negative 2 Baseline bold-italic gamma prime bold-italic gamma

Now consider a linear spline as in Ruppert, Wand, and Carroll (2003, p. 108),

p Subscript i Baseline equals beta 0 plus beta 1 x Subscript i Baseline plus sigma-summation Underscript j equals 1 Overscript upper K Endscripts gamma Subscript j Baseline left-parenthesis x Subscript i Baseline minus t Subscript j Baseline right-parenthesis Subscript plus

where the gamma Subscript j denote the spline coefficients at knots t 1 comma ellipsis comma t Subscript upper K Baseline. The truncated line function is defined as

left-parenthesis x minus t right-parenthesis Subscript plus Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column x minus t 2nd Column x greater-than t 2nd Row 1st Column 0 2nd Column normal o normal t normal h normal e normal r normal w normal i normal s normal e EndLayout

If you collect the intercept and regressor x into the matrix bold upper X, and if you collect the truncated line functions into the left-parenthesis n times upper K right-parenthesis matrix bold upper Z, then fitting the linear spline amounts to minimization of the penalized spline criterion

upper Q Superscript asterisk Baseline left-parenthesis bold-italic beta comma bold-italic gamma right-parenthesis equals left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis prime left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis plus lamda squared bold-italic gamma prime bold-italic gamma

where lamda is the smoothing parameter.

Because minimizing upper Q Superscript asterisk Baseline left-parenthesis bold-italic beta comma bold-italic gamma right-parenthesis with respect to bold-italic beta and bold-italic gamma is equivalent to minimizing upper Q Superscript asterisk Baseline left-parenthesis bold-italic beta comma bold-italic gamma right-parenthesis slash phi, both problems lead to the same solution, and lamda equals phi slash sigma is the smoothing parameter. The mixed model formulation of spline smoothing has the advantage that the smoothing parameter is selected "automatically." It is a function of the covariance parameter estimates, which, in turn, are estimated according to the method you specify with the METHOD= option in the PROC GLIMMIX statement.

To accommodate nonnormal responses and general link functions, the GLIMMIX procedure uses normal upper V normal a normal r left-bracket bold-italic epsilon right-bracket equals phi bold upper Delta overTilde Superscript negative 1 Baseline bold upper A bold upper Delta overTilde Superscript negative 1, where bold upper A is the matrix of variance functions and bold upper Delta is the diagonal matrix of mean derivatives defined earlier. The correspondence between spline smoothing and mixed modeling is then one between a weighted linear mixed model and a weighted spline. In other words, the minimization criterion that yields the estimates ModifyingAbove bold-italic beta With caret and solutions ModifyingAbove bold-italic gamma With caret is then

upper Q left-parenthesis bold-italic beta comma bold-italic gamma right-parenthesis equals phi Superscript negative 1 Baseline left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis prime bold upper Delta overTilde bold upper A Superscript negative 1 Baseline bold upper Delta overTilde left-parenthesis bold p minus bold upper X bold-italic beta minus bold upper Z bold-italic gamma right-parenthesis Superscript prime Baseline plus sigma Superscript negative 2 Baseline bold-italic gamma prime bold-italic gamma

If you choose the TYPE=RSMOOTH covariance structure, PROC GLIMMIX chooses radial basis functions as the spline basis and transforms them to approximate a thin-plate spline as in Chapter 13.4 of Ruppert, Wand, and Carroll (2003). For computational expediency, the number of knots is chosen to be less than the number of data points. Ruppert, Wand, and Carroll (2003) recommend one knot per every four unique regressor values for one-dimensional smoothers. In the multivariate case, general recommendations are more difficult, because the optimal number and placement of knots depend on the spatial configuration of samples. Their recommendation for a bivariate smoother is one knot per four samples, but at least 20 and no more than 150 knots (Ruppert, Wand, and Carroll 2003, p. 257).

The magnitude of the variance component sigma squared depends on the metric of the random effects. For example, if you apply radial smoothing in time, the variance changes if you measure time in days or minutes. If the solution for the variance component is near zero, then a rescaling of the random effect data can help the optimization problem by moving the solution for the variance component away from the boundary of the parameter space.

Last updated: December 09, 2022