The NLIN Procedure

Computational Methods

Nonlinear Least Squares

Recall the basic notation for the nonlinear regression model from the section Notation for Nonlinear Regression Models. The parameter vector bold-italic beta belongs to bold upper Omega, a subset of upper R Superscript p. Two points of this set are of particular interest: the true value bold-italic beta overTilde and the least squares estimate ModifyingAbove bold-italic beta With caret. The general nonlinear model fit with the NLIN procedure is represented by the equation

bold upper Y equals bold f left-parenthesis beta 0 overTilde comma beta 1 overTilde comma ellipsis comma beta Subscript p Baseline overTilde semicolon bold z 1 comma bold z 2 comma ellipsis comma bold z Subscript k Baseline right-parenthesis plus bold-italic epsilon equals bold f left-parenthesis bold-italic beta overTilde semicolon bold z 1 comma bold z 2 comma ellipsis comma bold z Subscript k Baseline right-parenthesis plus bold-italic epsilon

where bold z Subscript j denotes the left-parenthesis n times 1 right-parenthesis vector of the jth regressor (independent) variable, bold-italic beta overTilde is the true value of the parameter vector, and bold-italic epsilon is the left-parenthesis n times 1 right-parenthesis vector of homoscedastic and uncorrelated model errors with zero mean.

To write the model for the ith observation, the ith elements of bold z 1 comma ellipsis comma bold z Subscript k Baseline are collected in the row vector bold z prime Subscript i, and the model equation becomes

upper Y Subscript i Baseline equals f left-parenthesis bold-italic beta semicolon bold z prime Subscript i right-parenthesis plus epsilon Subscript i

The shorthand f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis will also be used throughout to denote the mean of the ith observation.

For any given value bold-italic beta we can compute the residual sum of squares

StartLayout 1st Row 1st Column normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis 2nd Column equals sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline minus f left-parenthesis bold-italic beta semicolon bold z prime Subscript i right-parenthesis right-parenthesis squared 2nd Row 1st Column Blank 2nd Column equals sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline minus f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis squared equals bold r left-parenthesis bold-italic beta right-parenthesis prime bold r left-parenthesis bold-italic beta right-parenthesis EndLayout

The aim of nonlinear least squares estimation is to find the value ModifyingAbove bold-italic beta With caret that minimizes normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis. Because bold f is a nonlinear function of bold-italic beta, a closed-form solution does not exist for this minimization problem. An iterative process is used instead. The iterative techniques that PROC NLIN uses are similar to a series of linear regressions involving the matrix bold upper X and the residual vector bold r equals bold y minus bold f left-parenthesis bold-italic beta right-parenthesis, evaluated at the current values of bold-italic beta.

It is more insightful, however, to describe the algorithms in terms of their approach to minimizing the residual sum of squares and in terms of their updating formulas. If ModifyingAbove bold-italic beta With caret Superscript left-parenthesis u right-parenthesis denotes the value of the parameter estimates at the uth iteration, and ModifyingAbove bold-italic beta With caret Superscript left-parenthesis 0 right-parenthesis are your starting values, then the NLIN procedure attempts to find values k and bold upper Delta such that

ModifyingAbove bold-italic beta With caret Superscript left-parenthesis u plus 1 right-parenthesis Baseline equals ModifyingAbove bold-italic beta With caret Superscript left-parenthesis u right-parenthesis Baseline plus k bold upper Delta

and

normal upper S normal upper S normal upper E left-parenthesis ModifyingAbove bold-italic beta With caret Superscript left-parenthesis u plus 1 right-parenthesis Baseline right-parenthesis less-than normal upper S normal upper S normal upper E left-parenthesis ModifyingAbove bold-italic beta With caret Superscript left-parenthesis u right-parenthesis Baseline right-parenthesis

The various methods to fit a nonlinear regression model—which you can select with the METHOD= option in the PROC NLIN statement—differ in the calculation of the update vector bold upper Delta.

The gradient and Hessian of the residual sum of squares with respect to individual parameters and pairs of parameters are, respectively,

StartLayout 1st Row 1st Column bold g left-parenthesis beta Subscript j Baseline right-parenthesis 2nd Column equals StartFraction partial-differential normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis Over partial-differential beta Subscript j Baseline EndFraction equals minus 2 sigma-summation Underscript i equals 1 Overscript n Endscripts left-parenthesis y Subscript i Baseline minus f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis StartFraction partial-differential f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis Over partial-differential beta Subscript j Baseline EndFraction 2nd Row 1st Column left-bracket bold upper H left-parenthesis bold-italic beta right-parenthesis right-bracket Subscript j k 2nd Column equals StartFraction partial-differential normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis Over partial-differential beta Subscript j Baseline partial-differential beta Subscript k Baseline EndFraction equals 2 sigma-summation Underscript i equals 1 Overscript n Endscripts StartFraction partial-differential f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis Over partial-differential beta Subscript j Baseline EndFraction StartFraction partial-differential f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis Over partial-differential beta Subscript k Baseline EndFraction minus left-parenthesis y Subscript i Baseline minus f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis StartFraction partial-differential squared f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis Over partial-differential beta Subscript j Baseline partial-differential beta Subscript k Baseline EndFraction EndLayout

Denote as bold upper H Subscript i Superscript asterisk Baseline left-parenthesis bold-italic beta right-parenthesis the Hessian matrix of the mean function,

left-bracket bold upper H Subscript i Superscript asterisk Baseline left-parenthesis bold-italic beta right-parenthesis right-bracket Subscript j k Baseline equals left-bracket StartFraction partial-differential squared f Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis Over partial-differential beta Subscript j Baseline partial-differential beta Subscript k Baseline EndFraction right-bracket Subscript j k

Collecting the derivatives across all parameters leads to the expressions

StartLayout 1st Row 1st Column bold g left-parenthesis bold-italic beta right-parenthesis 2nd Column equals StartFraction partial-differential normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis Over partial-differential bold-italic beta EndFraction equals minus 2 bold upper X prime bold r left-parenthesis bold-italic beta right-parenthesis 2nd Row 1st Column bold upper H left-parenthesis bold-italic beta right-parenthesis 2nd Column equals StartFraction partial-differential squared normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis Over partial-differential bold-italic beta partial-differential bold-italic beta prime EndFraction equals 2 left-parenthesis bold upper X prime bold upper X minus sigma-summation Underscript i equals 1 Overscript n Endscripts r Subscript i Baseline left-parenthesis bold-italic beta right-parenthesis bold upper H Subscript i Superscript asterisk Baseline left-parenthesis bold-italic beta right-parenthesis right-parenthesis EndLayout

The change in the vector of parameter estimates is computed as follows, depending on the estimation method:

StartLayout 1st Row 1st Column Gauss hyphen Newton colon bold upper Delta 2nd Column equals left-parenthesis minus normal upper E left-bracket bold upper H left-parenthesis bold-italic beta right-parenthesis right-bracket right-parenthesis Superscript minus Baseline bold g left-parenthesis bold-italic beta right-parenthesis equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold r 2nd Row 1st Column Marquardt colon bold upper Delta 2nd Column equals left-parenthesis bold upper X prime bold upper X plus lamda diag left-parenthesis bold upper X prime bold upper X right-parenthesis right-parenthesis Superscript minus Baseline bold upper X prime bold r 3rd Row 1st Column Newton colon bold upper Delta 2nd Column equals minus bold upper H left-parenthesis bold-italic beta right-parenthesis Superscript minus Baseline bold g left-parenthesis bold-italic beta right-parenthesis equals bold upper H left-parenthesis bold-italic beta right-parenthesis Superscript minus Baseline bold upper X prime bold r 4th Row 1st Column Steepest descent colon bold upper Delta 2nd Column equals minus one-half StartFraction partial-differential normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis Over partial-differential bold-italic beta EndFraction equals bold upper X prime bold r EndLayout

The Gauss-Newton and Marquardt iterative methods regress the residuals onto the partial derivatives of the model with respect to the parameters until the estimates converge. You can view the Marquardt algorithm as a Gauss-Newton algorithm with a ridging penalty. The Newton iterative method regresses the residuals onto a function of the first and second derivatives of the model with respect to the parameters until the estimates converge. Analytical first- and second-order derivatives are automatically computed as needed.

The default method used to compute left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus is the sweep (Goodnight 1979). It produces a reflexive generalized inverse (a g 2-inverse, Pringle and Rayner 1971). In some cases it might be preferable to use a Moore-Penrose inverse (a g 4-inverse) instead. If you specify the G4 option in the PROC NLIN statement, a g 4-inverse is used to calculate bold upper Delta on each iteration.

The four algorithms are now described in greater detail.

Algorithmic Details

Gauss-Newton and Newton Methods

From the preceding set of equations you can see that the Marquardt method is a ridged version of the Gauss-Newton method. If the ridge parameter lamda equals zero, the Marquardt step is identical to the Gauss-Newton step. An important difference between the Newton methods and the Gauss-Newton-type algorithms lies in the use of second derivatives. To motivate this distinctive element between Gauss-Newton and the Newton method, focus first on the objective function in nonlinear least squares. To numerically find the minimum of

normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis equals bold r left-parenthesis bold-italic beta right-parenthesis prime bold r left-parenthesis bold-italic beta right-parenthesis

you can approach the problem by approximating the sum of squares criterion by a criterion for which you can compute a closed-form solution. Following Seber and Wild (1989, Sect. 2.1.3), we can achieve that by doing the following:

  • approximating the model and substituting the approximation into normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis

  • approximating normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis directly

The first method, approximating the nonlinear model with a first-order Taylor series, is the purview of the Gauss-Newton method. Approximating the residual sum of squares directly is the realm of the Newton method.

The first-order Taylor series of the residual bold r left-parenthesis bold-italic beta right-parenthesis at the point ModifyingAbove bold-italic beta With caret is

bold r left-parenthesis bold-italic beta right-parenthesis almost-equals bold r left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis minus ModifyingAbove bold upper X With caret left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis

Substitution into normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis leads to the objective function for the Gauss-Newton step:

normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis almost-equals upper S Subscript upper G Baseline left-parenthesis bold-italic beta right-parenthesis equals bold r left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis Superscript prime Baseline minus 2 bold r prime left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis ModifyingAbove bold upper X With caret left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis plus left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis prime left-parenthesis ModifyingAbove bold upper X With caret prime ModifyingAbove bold upper X With caret right-parenthesis left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis

"Hat" notation is used here to indicate that the quantity in question is evaluated at ModifyingAbove bold-italic beta With caret.

To motivate the Newton method, take a second-order Taylor series of normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis around the value ModifyingAbove bold-italic beta With caret:

normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis almost-equals upper S Subscript upper N Baseline left-parenthesis bold-italic beta right-parenthesis equals normal upper S normal upper S normal upper E left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis plus bold g left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis prime left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis plus one-half left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis prime bold upper H left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis

Both upper S Subscript upper G Baseline left-parenthesis bold-italic beta right-parenthesis and upper S Subscript upper N Baseline left-parenthesis bold-italic beta right-parenthesis are quadratic functions in bold-italic beta and are easily minimized. The minima occur when

StartLayout 1st Row 1st Column Gauss hyphen Newton colon bold-italic beta minus ModifyingAbove bold-italic beta With caret 2nd Column equals left-parenthesis ModifyingAbove bold upper X With caret prime ModifyingAbove bold upper X With caret right-parenthesis Superscript minus Baseline ModifyingAbove bold upper X With caret prime bold r left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis 2nd Row 1st Column Newton colon bold-italic beta minus ModifyingAbove bold-italic beta With caret 2nd Column equals minus bold upper H left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis Superscript minus Baseline bold g left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis EndLayout

and these terms define the preceding bold upper Delta update vectors.

Gauss-Newton Method

Since the Gauss-Newton method is based on an approximation of the model, you can also derive the update vector by first considering the "normal" equations of the nonlinear model

bold upper X prime bold f left-parenthesis bold-italic beta right-parenthesis equals bold upper X prime bold upper Y

and then substituting the Taylor approximation

bold f left-parenthesis bold-italic beta right-parenthesis almost-equals bold f left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis plus bold upper X left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis

for bold f left-parenthesis bold-italic beta right-parenthesis. This leads to

StartLayout 1st Row 1st Column bold upper X prime left-parenthesis bold f left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis plus bold upper X left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis right-parenthesis 2nd Column equals bold upper X prime bold upper Y 2nd Row 1st Column left-parenthesis bold upper X prime bold upper X right-parenthesis left-parenthesis bold-italic beta minus ModifyingAbove bold-italic beta With caret right-parenthesis 2nd Column equals bold upper X prime bold upper Y minus bold upper X prime bold f left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis 3rd Row 1st Column left-parenthesis bold upper X prime bold upper X right-parenthesis bold upper Delta 2nd Column equals bold upper X prime bold r left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis EndLayout

and the update vector becomes

bold upper Delta equals left-parenthesis bold upper X prime bold upper X right-parenthesis Superscript minus Baseline bold upper X prime bold r left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis

Note: If bold upper X prime bold upper X is singular or becomes singular, PROC NLIN computes bold upper Delta by using a generalized inverse for the iterations after singularity occurs. If bold upper X prime bold upper X is still singular for the last iteration, the solution should be examined.

Newton Method

The Newton method uses the second derivatives and solves the equation

bold upper Delta equals bold upper H Superscript minus Baseline bold upper X prime bold r

If the automatic variables _WEIGHT_, _WGTJPJ_, and _RESID_ are used, then

bold upper Delta equals bold upper H Superscript minus Baseline bold upper X prime bold upper W Superscript normal upper S normal upper S normal upper E Baseline bold r Superscript asterisk

is the direction, where

bold upper H equals bold upper X prime bold upper W Superscript normal upper X normal upper P normal upper X Baseline bold upper X minus sigma-summation Underscript i equals 1 Overscript n Endscripts bold upper H Subscript i Superscript asterisk Baseline left-parenthesis bold-italic beta right-parenthesis w Subscript i Superscript normal upper X normal upper P normal upper X Baseline r Subscript i Superscript asterisk

and

bold upper W Superscript normal upper S normal upper S normal upper E

is an n times n diagonal matrix with elements w Subscript i Superscript normal upper S normal upper S normal upper E of weights from the _WEIGHT_ variable. Each element w Subscript i Superscript normal upper S normal upper S normal upper E contains the value of _WEIGHT_ for the ith observation.

bold upper W Superscript normal upper X normal upper P normal upper X

is an n times n diagonal matrix with elements w Subscript i Superscript normal upper X normal upper P normal upper X of weights from the _WGTJPJ_ variable.

Each element w Subscript i Superscript normal upper X normal upper P normal upper X contains the value of _WGTJPJ_ for the ith observation.

bold r Superscript asterisk

is a vector with elements r Subscript i Superscript asterisk from the _RESID_ variable. Each element r Subscript i Superscript asterisk contains the value of _RESID_ evaluated for the ith observation.

Marquardt Method

The updating formula for the Marquardt method is as follows:

bold upper Delta equals left-parenthesis bold upper X prime bold upper X plus lamda diag left-parenthesis bold upper X prime bold upper X right-parenthesis right-parenthesis Superscript minus Baseline bold upper X prime bold e

The Marquardt method is a compromise between the Gauss-Newton and steepest descent methods (Marquardt 1963). As lamda right-arrow 0, the direction approaches Gauss-Newton. As lamda right-arrow normal infinity, the direction approaches steepest descent.

Marquardt’s studies indicate that the average angle between Gauss-Newton and steepest descent directions is about 90 Superscript ring. A choice of lamda between 0 and infinity produces a compromise direction.

By default, PROC NLIN chooses lamda equals 10 Superscript negative 7 to start and computes bold upper Delta. If normal upper S normal upper S normal upper E left-parenthesis bold-italic beta 0 plus bold upper Delta right-parenthesis less-than normal upper S normal upper S normal upper E left-parenthesis bold-italic beta 0 right-parenthesis, then lamda equals lamda slash 10 for the next iteration. Each time normal upper S normal upper S normal upper E left-parenthesis bold-italic beta 0 plus bold upper Delta right-parenthesis greater-than normal upper S normal upper S normal upper E left-parenthesis bold-italic beta 0 right-parenthesis, then lamda equals 10 lamda.

Note: If the SSE decreases on each iteration, then lamda right-arrow 0, and you are essentially using the Gauss-Newton method. If SSE does not improve, then lamda is increased until you are moving in the steepest descent direction.

Marquardt’s method is equivalent to performing a series of ridge regressions, and it is useful when the parameter estimates are highly correlated or the objective function is not well approximated by a quadratic.

Steepest Descent (Gradient) Method

The steepest descent method is based directly on the gradient of 0.5 bold r left-parenthesis bold-italic beta right-parenthesis prime bold r left-parenthesis bold-italic beta right-parenthesis:

one-half StartFraction partial-differential normal upper S normal upper S normal upper E left-parenthesis bold-italic beta right-parenthesis Over partial-differential bold-italic beta EndFraction equals minus bold upper X prime bold r

The quantity minus bold upper X prime bold r is the gradient along which bold-italic epsilon prime bold-italic epsilon increases. Thus bold upper Delta equals bold upper X prime bold r is the direction of steepest descent.

If the automatic variables _WEIGHT_ and _RESID_ are used, then

bold upper Delta equals bold upper X prime bold upper W Superscript normal upper S normal upper S normal upper E Baseline bold r Superscript asterisk

is the direction, where

bold upper W Superscript normal upper S normal upper S normal upper E

is an n times n diagonal matrix with elements w Subscript i Superscript normal upper S normal upper S normal upper E of weights from the _WEIGHT_ variable. Each element w Subscript i Superscript normal upper S normal upper S normal upper E contains the value of _WEIGHT_ for the ith observation.

bold r Superscript asterisk

is a vector with elements r Subscript i Superscript asterisk from _RESID_. Each element r Subscript i Superscript asterisk contains the value of _RESID_ evaluated for the ith observation.

Using the method of steepest descent, let

bold-italic beta Superscript left-parenthesis k plus 1 right-parenthesis Baseline equals bold-italic beta Superscript left-parenthesis k right-parenthesis Baseline plus alpha bold upper Delta

where the scalar alpha is chosen such that

SSE left-parenthesis bold-italic beta Subscript i Baseline plus alpha bold upper Delta right-parenthesis less-than SSE left-parenthesis bold-italic beta Subscript i Baseline right-parenthesis

Note: The steepest-descent method can converge very slowly and is therefore not generally recommended. It is sometimes useful when the initial values are poor.

Step-Size Search

The default method of finding the step size k is step halving by using SMETHOD=HALVE. If normal upper S normal upper S normal upper E left-parenthesis bold-italic beta Superscript left-parenthesis u right-parenthesis Baseline plus bold upper Delta right-parenthesis greater-than normal upper S normal upper S normal upper E left-parenthesis bold-italic beta Superscript left-parenthesis u right-parenthesis Baseline right-parenthesis, compute normal upper S normal upper S normal upper E left-parenthesis bold-italic beta Superscript left-parenthesis u right-parenthesis Baseline plus 0.5 bold upper Delta right-parenthesis, normal upper S normal upper S normal upper E left-parenthesis bold-italic beta Superscript left-parenthesis u right-parenthesis Baseline plus 0.25 bold upper Delta right-parenthesis comma ellipsis comma until a smaller SSE is found.

If you specify SMETHOD=GOLDEN, the step size k is determined by a golden section search. The parameter TAU determines the length of the initial interval to be searched, with the interval having length TAU (or 2 times normal upper T normal upper A normal upper U), depending on normal upper S normal upper S normal upper E left-parenthesis bold-italic beta Superscript left-parenthesis u right-parenthesis Baseline plus bold upper Delta right-parenthesis. The RHO parameter specifies how fine the search is to be. The SSE at each endpoint of the interval is evaluated, and a new subinterval is chosen. The size of the interval is reduced until its length is less than RHO. One pass through the data is required each time the interval is reduced. Hence, if RHO is very small relative to TAU, a large amount of time can be spent determining a step size. For more information about the golden section search, see Kennedy and Gentle (1980).

If you specify SMETHOD=CUBIC, the NLIN procedure performs a cubic interpolation to estimate the step size. If the estimated step size does not result in a decrease in SSE, step halving is used.

Last updated: December 09, 2022