The SURVEYLOGISTIC Procedure

Model Fitting

Determining Observations for Likelihood Contributions

If you use the events/trials syntax, each observation is split into two observations. One has the response value 1 with a frequency equal to the value of the events variable. The other observation has the response value 2 and a frequency equal to the value of (trials – events). These two observations have the same explanatory variable values and the same WEIGHT values as the original observation.

For either the single-trial or the events/trials syntax, let j index all observations. In other words, for the single-trial syntax, j indexes the actual observations. And, for the events/trials syntax, j indexes the observations after splitting (as described previously). If your data set has 30 observations and you use the single-trial syntax, j has values from 1 to 30; if you use the events/trials syntax, j has values from 1 to 60.

Suppose the response variable in a cumulative response model can take on the ordered values 1 comma ellipsis comma k comma k plus 1, where k is an integer greater-than-or-equal-to 1. The likelihood for the jth observation with ordered response value y Subscript j and explanatory variables vector ( row vectors) bold x Subscript j is given by

StartLayout 1st Row 1st Column upper L Subscript j Baseline equals 2nd Column StartLayout Enlarged left-brace 1st Row 1st Column upper F left-parenthesis alpha 1 plus bold x Subscript j Baseline bold-italic beta right-parenthesis 2nd Column y Subscript j Baseline equals 1 2nd Row 1st Column upper F left-parenthesis alpha Subscript i Baseline plus bold x Subscript j Baseline bold-italic beta right-parenthesis minus upper F left-parenthesis alpha Subscript i minus 1 Baseline plus bold x Subscript j Baseline bold-italic beta right-parenthesis 2nd Column 1 less-than y Subscript j Baseline equals i less-than-or-equal-to k 3rd Row 1st Column 1 minus upper F left-parenthesis alpha Subscript k Baseline plus bold x Subscript j Baseline bold-italic beta right-parenthesis 2nd Column y Subscript j Baseline equals k plus 1 EndLayout EndLayout

where upper F left-parenthesis period right-parenthesis is the logistic, normal, or extreme-value distribution function; alpha 1 comma ellipsis comma alpha Subscript k Baseline are ordered intercept parameters; and bold-italic beta is the slope parameter vector.

For the generalized logit model, letting the k plus 1st level be the reference level, the intercepts alpha 1 comma ellipsis comma alpha Subscript k Baseline are unordered and the slope vector bold-italic beta Subscript i varies with each logit. The likelihood for the jth observation with ordered response value y Subscript j and explanatory variables vector bold x Subscript j (row vectors) is given by

StartLayout 1st Row 1st Column upper L Subscript j 2nd Column equals 3rd Column probability left-parenthesis upper Y equals y Subscript j Baseline vertical-bar bold x Subscript j Baseline right-parenthesis 2nd Row 1st Column Blank 2nd Column equals 3rd Column StartLayout Enlarged left-brace 1st Row 1st Column StartFraction e Superscript alpha Super Subscript i Superscript plus bold x Super Subscript j Superscript bold-italic beta Super Subscript i Superscript Baseline Over 1 plus sigma-summation Underscript i equals 1 Overscript k Endscripts e Superscript alpha Super Subscript i Superscript plus bold x Super Subscript j Superscript bold-italic beta Super Subscript i Superscript Baseline EndFraction 2nd Column 1 less-than-or-equal-to y Subscript j Baseline equals i less-than-or-equal-to k 2nd Row 1st Column StartFraction 1 Over 1 plus sigma-summation Underscript i equals 1 Overscript k Endscripts e Superscript alpha Super Subscript i Superscript plus bold x Super Subscript j Superscript bold-italic beta Super Subscript i Superscript Baseline EndFraction 2nd Column y Subscript j Baseline equals k plus 1 EndLayout EndLayout

Iterative Algorithms for Model Fitting

Two iterative maximum likelihood algorithms are available in PROC SURVEYLOGISTIC to obtain the pseudo-estimate ModifyingAbove bold-italic theta With caret of the model parameter bold-italic theta. The default is the Fisher scoring method, which is equivalent to fitting by iteratively reweighted least squares. The alternative algorithm is the Newton-Raphson method. Both algorithms give the same parameter estimates; the covariance matrix of ModifyingAbove bold-italic theta With caret is estimated in the section Variance Estimation. For a generalized logit model, only the Newton-Raphson technique is available. You can use the TECHNIQUE= option in the MODEL statement to select a fitting algorithm.

Iteratively Reweighted Least Squares Algorithm (Fisher Scoring)

Let Y be the response variable that takes values 1 comma ellipsis comma k comma k plus 1 left-parenthesis k greater-than-or-equal-to 1 right-parenthesis. Let j index all observations and upper Y Subscript j be the value of response for the jth observation. Consider the multinomial variable bold upper Z Subscript j Baseline equals left-parenthesis upper Z Subscript 1 j Baseline comma ellipsis comma upper Z Subscript k j Baseline right-parenthesis prime such that

StartLayout 1st Row 1st Column upper Z Subscript i j Baseline equals 2nd Column StartLayout Enlarged left-brace 1st Row 1st Column 1 2nd Column if upper Y Subscript j Baseline equals i 2nd Row 1st Column 0 2nd Column otherwise EndLayout EndLayout

and upper Z Subscript left-parenthesis k plus 1 right-parenthesis j Baseline equals 1 minus sigma-summation Underscript i equals 1 Overscript k Endscripts upper Z Subscript i j. With pi Subscript i j denoting the probability that the jth observation has response value i, the expected value of bold upper Z Subscript j is bold-italic pi Subscript j Baseline equals left-parenthesis pi Subscript 1 j Baseline comma ellipsis comma pi Subscript k j Baseline right-parenthesis prime, and pi Subscript left-parenthesis k plus 1 right-parenthesis j Baseline equals 1 minus sigma-summation Underscript i equals 1 Overscript k Endscripts pi Subscript i j. The covariance matrix of bold upper Z Subscript j is bold upper V Subscript j, which is the covariance matrix of a multinomial random variable for one trial with parameter vector bold-italic pi Subscript j. Let bold-italic theta be the vector of regression parameters—for example, bold-italic theta equals left-parenthesis alpha 1 comma ellipsis comma alpha Subscript k Baseline comma bold-italic beta Superscript prime Baseline right-parenthesis prime for cumulative logit model. Let bold upper D Subscript j be the matrix of partial derivatives of bold-italic pi Subscript j with respect to bold-italic theta. The estimating equation for the regression parameters is

sigma-summation Underscript j Endscripts bold upper D prime Subscript j Baseline bold upper W Subscript j Baseline left-parenthesis bold upper Z Subscript j Baseline minus bold-italic pi Subscript j Baseline right-parenthesis equals bold 0

where bold upper W Subscript j Baseline equals w Subscript j Baseline f Subscript j Baseline bold upper V Subscript j Superscript negative 1, and w Subscript j and f Subscript j are the WEIGHT and FREQ values of the jth observation.

With a starting value of bold-italic theta Superscript left-parenthesis 0 right-parenthesis, the pseudo-estimate of bold-italic theta is obtained iteratively as

bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis Baseline equals bold-italic theta Superscript left-parenthesis i right-parenthesis Baseline plus left-parenthesis sigma-summation Underscript j Endscripts bold upper D prime Subscript j Baseline bold upper W Subscript j Baseline bold upper D Subscript j Baseline right-parenthesis Superscript negative 1 Baseline sigma-summation Underscript j Endscripts bold upper D prime Subscript j Baseline bold upper W Subscript j Baseline left-parenthesis bold upper Z Subscript j Baseline minus bold-italic pi Subscript j Baseline right-parenthesis

where bold upper D Subscript j, bold upper W Subscript j, and bold-italic pi Subscript j are evaluated at the ith iteration bold-italic theta Superscript left-parenthesis i right-parenthesis. The expression after the plus sign is the step size. If the log likelihood evaluated at bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis is less than that evaluated at bold-italic theta Superscript left-parenthesis i right-parenthesis, then bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis is recomputed by step-halving or ridging. The iterative scheme continues until convergence is obtained—that is, until bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis is sufficiently close to bold-italic theta Superscript left-parenthesis i right-parenthesis. Then the maximum likelihood estimate of bold-italic theta is ModifyingAbove bold-italic theta With caret equals bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis.

By default, starting values are zero for the slope parameters, and starting values are the observed cumulative logits (that is, logits of the observed cumulative proportions of response) for the intercept parameters. Alternatively, the starting values can be specified with the INEST= option in the PROC SURVEYLOGISTIC statement.

Newton-Raphson Algorithm

Let

StartLayout 1st Row 1st Column bold g 2nd Column equals 3rd Column sigma-summation Underscript j Endscripts w Subscript j Baseline f Subscript j Baseline StartFraction partial-differential l Subscript j Baseline Over partial-differential bold-italic theta EndFraction 2nd Row 1st Column bold upper H 2nd Column equals 3rd Column sigma-summation Underscript j Endscripts minus w Subscript j Baseline f Subscript j Baseline StartFraction partial-differential squared l Subscript j Baseline Over partial-differential bold-italic theta squared EndFraction EndLayout

be the gradient vector and the Hessian matrix, where l Subscript j Baseline equals log upper L Subscript j is the log likelihood for the jth observation. With a starting value of bold-italic theta Superscript left-parenthesis 0 right-parenthesis, the pseudo-estimate ModifyingAbove bold-italic theta With caret of bold-italic theta is obtained iteratively until convergence is obtained:

bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis Baseline equals bold-italic theta Superscript left-parenthesis i right-parenthesis Baseline plus bold upper H Superscript negative 1 Baseline bold g

where bold upper H and bold g are evaluated at the ith iteration bold-italic theta Superscript left-parenthesis i right-parenthesis. If the log likelihood evaluated at bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis is less than that evaluated at bold-italic theta Superscript left-parenthesis i right-parenthesis, then bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis is recomputed by step-halving or ridging. The iterative scheme continues until convergence is obtained—that is, until bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis is sufficiently close to bold-italic theta Superscript left-parenthesis i right-parenthesis. Then the maximum likelihood estimate of bold-italic theta is ModifyingAbove bold-italic theta With caret equals bold-italic theta Superscript left-parenthesis i plus 1 right-parenthesis.

Convergence Criteria

Four convergence criteria are allowed: ABSFCONV=, FCONV=, GCONV=, and XCONV=. If you specify more than one convergence criterion, the optimization is terminated as soon as one of the criteria is satisfied. If none of the criteria is specified, the default is GCONV=1E–8.

Existence of Maximum Likelihood Estimates

The likelihood equation for a logistic regression model does not always have a finite solution. Sometimes there is a nonunique maximum on the boundary of the parameter space, at infinity. The existence, finiteness, and uniqueness of pseudo-estimates for the logistic regression model depend on the patterns of data points in the observation space (Albert and Anderson 1984; Santner and Duffy 1986).

Consider a binary response model. Let upper Y Subscript j be the response of the ith subject, and let bold x Subscript j be the row vector of explanatory variables (including the constant 1 associated with the intercept). There are three mutually exclusive and exhaustive types of data configurations: complete separation, quasi-complete separation, and overlap.

Complete separation

There is a complete separation of data points if there exists a vector bold b that correctly allocates all observations to their response groups; that is,

StartLayout 1st Row  StartLayout Enlarged left-brace 1st Row 1st Column bold x Subscript j Baseline bold b greater-than 0 2nd Column upper Y Subscript j Baseline equals 1 2nd Row 1st Column bold x Subscript j Baseline bold b less-than 0 2nd Column upper Y Subscript j Baseline equals 2 EndLayout EndLayout

This configuration gives nonunique infinite estimates. If the iterative process of maximizing the likelihood function is allowed to continue, the log likelihood diminishes to zero, and the dispersion matrix becomes unbounded.

Quasi-complete separation

The data are not completely separable, but there is a vector bold b such that

StartLayout 1st Row  StartLayout Enlarged left-brace 1st Row 1st Column bold x Subscript j Baseline bold b greater-than-or-equal-to 0 2nd Column upper Y Subscript j Baseline equals 1 2nd Row 1st Column bold x Subscript j Baseline bold b less-than-or-equal-to 0 2nd Column upper Y Subscript j Baseline equals 2 EndLayout EndLayout

and equality holds for at least one subject in each response group. This configuration also yields nonunique infinite estimates. If the iterative process of maximizing the likelihood function is allowed to continue, the dispersion matrix becomes unbounded and the log likelihood diminishes to a nonzero constant.

Overlap

If neither complete nor quasi-complete separation exists in the sample points, there is an overlap of sample points. In this configuration, the pseudo-estimates exist and are unique.

Complete separation and quasi-complete separation are problems typically encountered with small data sets. Although complete separation can occur with any type of data, quasi-complete separation is not likely with truly continuous explanatory variables.

The SURVEYLOGISTIC procedure uses a simple empirical approach to recognize the data configurations that lead to infinite parameter estimates. The basis of this approach is that any convergence method of maximizing the log likelihood must yield a solution that gives complete separation, if such a solution exists. In maximizing the log likelihood, there is no checking for complete or quasi-complete separation if convergence is attained in eight or fewer iterations. Subsequent to the eighth iteration, the probability of the observed response is computed for each observation. If the probability of the observed response is one for all observations, there is a complete separation of data points and the iteration process is stopped. If the complete separation of data has not been determined and an observation is identified to have an extremely large probability (greater-than-or-equal-to0.95) of the observed response, there are two possible situations. First, there is overlap in the data set, and the observation is an atypical observation of its own group. The iterative process, if allowed to continue, stops when a maximum is reached. Second, there is quasi-complete separation in the data set, and the asymptotic dispersion matrix is unbounded. If any of the diagonal elements of the dispersion matrix for the standardized observations vectors (all explanatory variables standardized to zero mean and unit variance) exceeds 5,000, quasi-complete separation is declared and the iterative process is stopped. If either complete separation or quasi-complete separation is detected, a warning message is displayed in the procedure output.

Checking for quasi-complete separation is less foolproof than checking for complete separation. The NOCHECK option in the MODEL statement turns off the process of checking for infinite parameter estimates. In cases of complete or quasi-complete separation, turning off the checking process typically results in the procedure failing to converge.

Model Fitting Statistics

Suppose the model contains s explanatory effects. For the jth observation, let ModifyingAbove pi With caret Subscript j be the estimated probability of the observed response. The three criteria displayed by the SURVEYLOGISTIC procedure are calculated as follows:

  • –2 log likelihood:

    negative 2 Log upper L equals minus 2 sigma-summation Underscript j Endscripts w Subscript j Baseline f Subscript j Baseline log left-parenthesis ModifyingAbove pi With caret Subscript j Baseline right-parenthesis

    where w Subscript j and f Subscript j are the weight and frequency values, respectively, of the jth observation. For binary response models that use the events/trials syntax, this is equivalent to

    negative 2 Log upper L equals minus 2 sigma-summation Underscript j Endscripts w Subscript j Baseline f Subscript j Baseline StartSet r Subscript j Baseline log left-parenthesis ModifyingAbove pi With caret Subscript j Baseline right-parenthesis plus left-parenthesis n Subscript j Baseline minus r Subscript j Baseline right-parenthesis log left-parenthesis 1 minus ModifyingAbove pi With caret Subscript j Baseline right-parenthesis EndSet

    where r Subscript j is the number of events, n Subscript j is the number of trials, and ModifyingAbove pi With caret Subscript j is the estimated event probability.

  • Akaike information criterion:

    AIC equals negative 2 Log upper L plus 2 p

    where p is the number of parameters in the model. For cumulative response models, p equals k plus s, where k is the total number of response levels minus one, and s is the number of explanatory effects. For the generalized logit model, p equals k left-parenthesis s plus 1 right-parenthesis.

  • Schwarz criterion:

    SC equals negative 2 Log upper L plus p log left-parenthesis sigma-summation Underscript j Endscripts w Subscript j Baseline f Subscript j Baseline right-parenthesis

    where p is the number of parameters in the model. For cumulative response models, p equals k plus s, where k is the total number of response levels minus one, and s is the number of explanatory effects. For the generalized logit model, p equals k left-parenthesis s plus 1 right-parenthesis.

The –2 log likelihood statistic has a chi-square distribution under the null hypothesis (that all the explanatory effects in the model are zero), and the procedure produces a p-value for this statistic. The AIC and SC statistics give two different ways of adjusting the –2 log likelihood statistic for the number of terms in the model and the number of observations used.

Generalized Coefficient of Determination

Cox and Snell (1989, pp. 208–209) propose the following generalization of the coefficient of determination to a more general linear model:

upper R squared equals 1 minus StartSet StartFraction upper L left-parenthesis bold 0 right-parenthesis Over upper L left-parenthesis ModifyingAbove bold-italic theta With caret right-parenthesis EndFraction EndSet Superscript StartFraction 2 Over upper N EndFraction

where upper L left-parenthesis bold 0 right-parenthesis is the likelihood of the intercept-only model, upper L left-parenthesis ModifyingAbove bold-italic theta With caret right-parenthesis is the likelihood of the specified model, and N is the population size. The quantity upper R squared achieves a maximum of less than 1 for discrete models, where the maximum is given by

upper R Subscript max Superscript 2 Baseline equals 1 minus StartSet upper L left-parenthesis bold 0 right-parenthesis EndSet Superscript StartFraction 2 Over upper N EndFraction

Nagelkerke (1991) proposes the following adjusted coefficient, which can achieve a maximum value of 1:

upper R overTilde squared equals StartFraction upper R squared Over upper R Subscript max Superscript 2 Baseline EndFraction

Properties and interpretation of upper R squared and upper R overTilde squared are provided in Nagelkerke (1991). In the "Testing Global Null Hypothesis: BETA=0" table, upper R squared is labeled as "RSquare" and upper R overTilde squared is labeled as "Max-rescaled RSquare."  Use the RSQUARE option to request upper R squared and upper R overTilde squared.

INEST= Data Set

You can specify starting values for the iterative algorithm in the INEST= data set.

The INEST= data set contains one observation for each BY group. The INEST= data set must contain the intercept variables (named Intercept for binary response models and Intercept, Intercept2, Intercept3, and so forth, for ordinal response models) and all explanatory variables in the MODEL statement. If BY processing is used, the INEST= data set should also include the BY variables, and there must be one observation for each BY group. If the INEST= data set also contains the _TYPE_ variable, only observations with _TYPE_ value 'PARMS' are used as starting values.

Last updated: December 09, 2022