The SURVEYLOGISTIC Procedure

Logistic Regression Models and Parameters

The SURVEYLOGISTIC procedure fits a logistic regression model and estimates the corresponding regression parameters. Each model uses the link function you specified in the LINK= option in the MODEL statement. There are four types of model you can use with the procedure: cumulative logit model, complementary log-log model, probit model, and generalized logit model.

Notation

Let Y be the response variable with categories . The p covariates are denoted by a p-dimension row vector .

For a stratified clustered sample design, each observation is represented by a row vector, , where

is the stratum index
is the cluster index within stratum h
is the unit index within cluster i of stratum h
denotes the sampling weight
is a D-dimensional column vector whose elements are indicator variables for the first D categories for variable Y. If the response of the jth unit of the ith cluster in stratum h falls in category d, the dth element of the vector is one, and the remaining elements of the vector are zero, where .
is the indicator variable for the category of variable Y
denotes the k-dimensional row vector of explanatory variables for the jth unit of the ith cluster in stratum h. If there is an intercept, then .
is the total number of clusters in the sample
is the total sample size

The following notations are also used:

denotes the sampling rate for stratum h
is the expected vector of the response variable:

Note that , where 1 is a D-dimensional column vector whose elements are 1.

Logistic Regression Models

If the response categories of the response variable Y can be restricted to a number of ordinal values, you can fit cumulative probabilities of the response categories with a cumulative logit model, a complementary log-log model, or a probit model. Details of cumulative logit models (or proportional odds models) can be found in McCullagh and Nelder (1989). If the response categories of Y are nominal responses without natural ordering, you can fit the response probabilities with a generalized logit model. Formulation of the generalized logit models for nominal response variables can be found in Agresti (2002). For each model, the procedure estimates the model parameter by using a pseudo-log-likelihood function. The procedure obtains the pseudo-maximum likelihood estimator by using iterations described in the section Iterative Algorithms for Model Fitting and estimates its variance described in the section Variance Estimation.

Cumulative Logit Model

A cumulative logit model uses the logit function

g left-parenthesis t right-parenthesis equals log left-parenthesis StartFraction t Over 1 minus t EndFraction right-parenthesis

as the link function.

Denote the cumulative sum of the expected proportions for the first d categories of variable Y by

upper F Subscript h i j d Baseline equals sigma-summation Underscript r equals 1 Overscript d Endscripts pi Subscript h i j r

for Then the cumulative logit model can be written as

log left-parenthesis StartFraction upper F Subscript h i j d Baseline Over 1 minus upper F Subscript h i j d Baseline EndFraction right-parenthesis equals alpha Subscript d Baseline plus bold x Subscript h i j Baseline bold-italic beta

with the model parameters

StartLayout 1st Row 1st Column bold-italic beta 2nd Column equals 3rd Column left-parenthesis beta 1 comma beta 2 comma ellipsis comma beta Subscript k Baseline right-parenthesis prime 2nd Row 1st Column bold-italic alpha 2nd Column equals 3rd Column left-parenthesis alpha 1 comma alpha 2 comma ellipsis comma alpha Subscript upper D Baseline right-parenthesis prime comma alpha 1 less-than alpha 2 less-than midline-horizontal-ellipsis less-than alpha Subscript upper D Baseline 3rd Row 1st Column bold-italic theta 2nd Column equals 3rd Column left-parenthesis bold-italic alpha prime comma bold-italic beta Superscript prime Baseline right-parenthesis prime EndLayout

Complementary Log-Log Model

A complementary log-log model uses the complementary log-log function

g left-parenthesis t right-parenthesis equals log left-parenthesis minus log left-parenthesis 1 minus t right-parenthesis right-parenthesis

as the link function. Denote the cumulative sum of the expected proportions for the first d categories of variable Y by

for Then the complementary log-log model can be written as

log left-parenthesis minus log left-parenthesis 1 minus upper F Subscript h i j d Baseline right-parenthesis right-parenthesis equals alpha Subscript d Baseline plus bold x Subscript h i j Baseline bold-italic beta

with the model parameters

Probit Model

A probit model uses the probit (or normit) function, which is the inverse of the cumulative standard normal distribution function,

g left-parenthesis t right-parenthesis equals normal upper Phi Superscript negative 1 Baseline left-parenthesis t right-parenthesis

as the link function, where

normal upper Phi left-parenthesis t right-parenthesis equals StartFraction 1 Over StartRoot 2 pi EndRoot EndFraction integral Subscript negative normal infinity Superscript t Baseline e Superscript minus one-half z squared Baseline d z

Denote the cumulative sum of the expected proportions for the first d categories of variable Y by

for Then the probit model can be written as

upper F Subscript h i j d Baseline equals normal upper Phi left-parenthesis alpha Subscript d Baseline plus bold x Subscript h i j Baseline bold-italic beta right-parenthesis

with the model parameters

Generalized Logit Model

For nominal response, a generalized logit model is to fit the ratio of the expected proportion for each response category over the expected proportion of a reference category with a logit link function.

Without loss of generality, let category be the reference category for the response variable Y. Denote the expected proportion for the dth category by as in the section Notation. Then the generalized logit model can be written as

log left-parenthesis StartFraction pi Subscript h i j d Baseline Over pi Subscript h i j left-parenthesis upper D plus 1 right-parenthesis Baseline EndFraction right-parenthesis equals bold x Subscript h i j Baseline bold-italic beta Subscript d

for with the model parameters

StartLayout 1st Row 1st Column bold-italic beta Subscript d 2nd Column equals 3rd Column left-parenthesis beta Subscript d Baseline 1 Baseline comma beta Subscript d Baseline 2 Baseline comma ellipsis comma beta Subscript d k Baseline right-parenthesis prime 2nd Row 1st Column bold-italic theta 2nd Column equals 3rd Column left-parenthesis bold-italic beta prime 1 comma bold-italic beta prime 2 comma ellipsis comma bold-italic beta prime Subscript upper D right-parenthesis prime EndLayout

Likelihood Function

Let be a link function such that

bold-italic pi equals bold g left-parenthesis bold x comma bold-italic theta right-parenthesis

where is a column vector for regression coefficients. The pseudo-log likelihood is

l left-parenthesis bold-italic theta right-parenthesis equals sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline left-parenthesis left-parenthesis log left-parenthesis bold-italic pi Subscript h i j Baseline right-parenthesis right-parenthesis prime bold y Subscript h i j Baseline plus log left-parenthesis pi Subscript h i j left-parenthesis upper D plus 1 right-parenthesis Baseline right-parenthesis y Subscript h i j left-parenthesis upper D plus 1 right-parenthesis Baseline right-parenthesis

Denote the pseudo-estimator as , which is a solution to the estimating equations:

sigma-summation Underscript h equals 1 Overscript upper H Endscripts sigma-summation Underscript i equals 1 Overscript n Subscript h Baseline Endscripts sigma-summation Underscript j equals 1 Overscript m Subscript h i Baseline Endscripts w Subscript h i j Baseline bold upper D Subscript h i j Baseline left-parenthesis normal d normal i normal a normal g left-parenthesis bold-italic pi Subscript h i j Baseline right-parenthesis minus bold-italic pi Subscript h i j Baseline bold-italic pi prime Subscript h i j right-parenthesis Superscript negative 1 Baseline left-parenthesis bold y Subscript h i j Baseline minus bold-italic pi Subscript h i j Baseline right-parenthesis equals bold 0

where is the matrix of partial derivatives of the link function with respect to .

To obtain the pseudo-estimator , the procedure uses iterations with a starting value for . See the section Iterative Algorithms for Model Fitting for more details.

Last updated: December 09, 2022