The LOGISTIC Procedure

Iterative Algorithms for Model Fitting

This section describes the two iterative maximum likelihood algorithms that are available in PROC LOGISTIC for fitting an unconditional logistic regression. For information about available optimization techniques for conditional logistic regression and models that specify the EQUALSLOPES or UNEQUALSLOPES options, see the section NLOPTIONS Statement. Exact logistic regression uses a special algorithm, which is described in the section Exact Conditional Logistic Regression.

The default maximum likelihood algorithm is the Fisher scoring method, which is equivalent to fitting by iteratively reweighted least squares. The alternative algorithm is the Newton-Raphson method. For generalized logit models, adjacent-category logit models, and models that specify the EQUALSLOPES or UNEQUALSLOPES options, only the Newton-Raphson technique is available. Both algorithms produce the same parameter estimates. However, the estimated covariance matrix of the parameter estimators can differ slightly because Fisher scoring is based on the expected information matrix whereas the Newton-Raphson method is based on the observed information matrix. For a binary logit model, the observed and expected information matrices are identical, resulting in identical estimated covariance matrices for both algorithms. You can specify the TECHNIQUE= option to select a fitting algorithm, and you can specify the FIRTH option to perform a bias-reducing penalized maximum likelihood fit.

Iteratively Reweighted Least Squares Algorithm (Fisher Scoring)

Consider the multinomial variable bold upper Z Subscript j Baseline equals left-parenthesis upper Z Subscript 1 j Baseline comma ellipsis comma upper Z Subscript k plus 1 comma j Baseline right-parenthesis prime such that

StartLayout 1st Row 1st Column upper Z Subscript i j Baseline equals 2nd Column StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column if upper Y Subscript j Baseline equals i 2nd Row 1st Column 0 2nd Column otherwise EndLayout EndLayout

With pi Subscript i j denoting the probability that the jth observation has response value i, the expected value of bold upper Z Subscript j is bold-italic pi Subscript j Baseline equals left-parenthesis pi Subscript 1 j Baseline comma ellipsis comma pi Subscript k plus 1 comma j Baseline right-parenthesis prime where pi Subscript k plus 1 comma j Baseline equals 1 minus sigma-summation Underscript i equals 1 Overscript k Endscripts pi Subscript i j. The covariance matrix of bold upper Z Subscript j is bold upper V Subscript j, which is the covariance matrix of a multinomial random variable for one trial with parameter vector bold-italic pi Subscript j. Let bold-italic beta be the vector of regression parameters; in other words, bold-italic beta equals left-parenthesis alpha 1 comma ellipsis comma alpha Subscript k Baseline comma beta 1 comma ellipsis comma beta Subscript s Baseline right-parenthesis prime. Let bold upper D Subscript j be the matrix of partial derivatives of bold-italic pi Subscript j with respect to bold-italic beta. The estimating equation for the regression parameters is

sigma-summation Underscript j Endscripts bold upper D prime Subscript j Baseline bold upper W Subscript j Baseline left-parenthesis bold upper Z Subscript j Baseline minus bold-italic pi Subscript j Baseline right-parenthesis equals bold 0

where bold upper W Subscript j Baseline equals w Subscript j Baseline f Subscript j Baseline bold upper V Subscript j Superscript minus, w Subscript j and f Subscript j are the weight and frequency of the jth observation, and bold upper V Subscript j Superscript minus is a generalized inverse of bold upper V Subscript j. PROC LOGISTIC chooses bold upper V Subscript j Superscript minus as the inverse of the diagonal matrix with bold-italic pi Subscript j as the diagonal.

With a starting value of bold-italic beta Superscript left-parenthesis 0 right-parenthesis, the maximum likelihood estimate of bold-italic beta is obtained iteratively as

bold-italic beta Superscript left-parenthesis m plus 1 right-parenthesis Baseline equals bold-italic beta Superscript left-parenthesis m right-parenthesis Baseline plus left-parenthesis sigma-summation Underscript j Endscripts bold upper D prime Subscript j Baseline bold upper W Subscript j Baseline bold upper D Subscript j Baseline right-parenthesis Superscript negative 1 Baseline sigma-summation Underscript j Endscripts bold upper D prime Subscript j Baseline bold upper W Subscript j Baseline left-parenthesis bold upper Z Subscript j Baseline minus bold-italic pi Subscript j Baseline right-parenthesis

where bold upper D Subscript j, bold upper W Subscript j, and bold-italic pi Subscript j are evaluated at bold-italic beta Superscript left-parenthesis m right-parenthesis. The expression after the plus sign is the step size. If the likelihood evaluated at bold-italic beta Superscript left-parenthesis m plus 1 right-parenthesis is less than that evaluated at bold-italic beta Superscript left-parenthesis m right-parenthesis, then bold-italic beta Superscript left-parenthesis m plus 1 right-parenthesis is recomputed by step-halving or ridging as determined by the value of the RIDGING= option. The iterative scheme continues until convergence is obtained—that is, until bold-italic beta Superscript left-parenthesis m plus 1 right-parenthesis is sufficiently close to bold-italic beta Superscript left-parenthesis m right-parenthesis. Then the maximum likelihood estimate of bold-italic beta is ModifyingAbove bold-italic beta With caret equals bold-italic beta Superscript left-parenthesis m plus 1 right-parenthesis.

The covariance matrix of ModifyingAbove bold-italic beta With caret is estimated by

ModifyingAbove Cov With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis equals left-parenthesis sigma-summation Underscript j Endscripts ModifyingAbove bold upper D With caret prime Subscript j Baseline ModifyingAbove bold upper W With caret Subscript j Baseline ModifyingAbove bold upper D With caret Subscript j Baseline right-parenthesis Superscript negative 1 Baseline equals ModifyingAbove bold upper I With caret Superscript negative 1

where ModifyingAbove bold upper D With caret Subscript j and ModifyingAbove bold upper W With caret Subscript j are, respectively, bold upper D Subscript j and bold upper W Subscript j evaluated at ModifyingAbove bold-italic beta With caret. ModifyingAbove bold upper I With caret is the information matrix, or the negative expected Hessian matrix, evaluated at ModifyingAbove bold-italic beta With caret.

By default, starting values are zero for the slope parameters, and for the intercept parameters, starting values are the observed cumulative logits (that is, logits of the observed cumulative proportions of response). Alternatively, the starting values can be specified with the INEST= option.

Newton-Raphson Algorithm

For cumulative models, let the parameter vector be bold-italic beta equals left-parenthesis alpha 1 comma ellipsis comma alpha Subscript k Baseline comma beta 1 comma ellipsis comma beta Subscript s Baseline right-parenthesis prime, and for the generalized logit model let bold-italic beta equals left-parenthesis alpha 1 comma ellipsis comma alpha Subscript k Baseline comma bold-italic beta prime 1 comma ellipsis comma bold-italic beta prime Subscript k right-parenthesis prime. The gradient vector and the Hessian matrix are given, respectively, by

StartLayout 1st Row 1st Column bold g 2nd Column equals 3rd Column sigma-summation Underscript j Endscripts w Subscript j Baseline f Subscript j Baseline StartFraction partial-differential l Subscript j Baseline Over partial-differential bold-italic beta EndFraction 2nd Row 1st Column bold upper H 2nd Column equals 3rd Column sigma-summation Underscript j Endscripts w Subscript j Baseline f Subscript j Baseline StartFraction partial-differential squared l Subscript j Baseline Over partial-differential bold-italic beta squared EndFraction EndLayout

where l Subscript j Baseline equals log upper L Subscript j is the log likelihood for the jth observation. With a starting value of bold-italic beta Superscript left-parenthesis 0 right-parenthesis, the maximum likelihood estimate ModifyingAbove bold-italic beta With caret of bold-italic beta is obtained iteratively until convergence is obtained:

bold-italic beta Superscript left-parenthesis m plus 1 right-parenthesis Baseline equals bold-italic beta Superscript left-parenthesis m right-parenthesis Baseline minus bold upper H Superscript negative 1 Baseline bold g

where bold upper H and bold g are evaluated at bold-italic beta Superscript left-parenthesis m right-parenthesis. If the likelihood evaluated at bold-italic beta Superscript left-parenthesis m plus 1 right-parenthesis is less than that evaluated at bold-italic beta Superscript left-parenthesis m right-parenthesis, then bold-italic beta Superscript left-parenthesis m plus 1 right-parenthesis is recomputed by step-halving or ridging.

The covariance matrix of ModifyingAbove bold-italic beta With caret is estimated by

ModifyingAbove Cov With caret left-parenthesis ModifyingAbove bold-italic beta With caret right-parenthesis equals ModifyingAbove bold upper I With caret Superscript negative 1

where the observed information matrix ModifyingAbove bold upper I With caret equals minus ModifyingAbove bold upper H With caret is computed by evaluating bold upper H at ModifyingAbove bold-italic beta With caret.

Firth’s Bias-Reducing Penalized Likelihood

Firth’s bias-reducing penalized likelihood method is available for binary response models. The log likelihood l left-parenthesis bold-italic beta right-parenthesis is penalized by the determinant of the information matrix bold upper I,

l left-parenthesis bold-italic beta right-parenthesis plus one-half StartAbsoluteValue bold upper I left-parenthesis bold-italic beta right-parenthesis EndAbsoluteValue

and the components of the gradient bold g left-parenthesis bold-italic beta right-parenthesis are computed as

bold g left-parenthesis beta Subscript j Baseline right-parenthesis plus one-half normal t normal r normal a normal c normal e left-parenthesis bold upper I left-parenthesis bold-italic beta right-parenthesis Superscript negative 1 Baseline StartFraction partial-differential bold upper I left-parenthesis bold-italic beta right-parenthesis Over partial-differential beta Subscript j Baseline EndFraction right-parenthesis

The Hessian matrix is not modified by this penalty, and the optimization is performed in the usual manner.

Note that for the probit and complementary log-log links, the penalty for the Fisher scoring method is based on the expected information matrix, but the penalty for the Newton-Raphson algorithm is based on the observed information matrix. Because these are noncanonical links, the resulting parameter estimates will be different.

Last updated: December 09, 2022