The LOGISTIC Procedure

Exact Conditional Logistic Regression

The theory of exact logistic regression, also known as exact conditional logistic regression, was originally laid out by Cox (1970), and the computational methods employed in PROC LOGISTIC are described in Hirji, Mehta, and Patel (1987); Hirji (1992); Mehta, Patel, and Senchaudhuri (1992). Other useful references for the derivations include Cox and Snell (1989); Agresti (1990); Mehta and Patel (1995).

Exact conditional inference is based on generating the conditional distribution for the sufficient statistics of the parameters of interest. This distribution is called the permutation or exact conditional distribution. Using the notation in the section Computational Details, follow Mehta and Patel (1995) and first note that the sufficient statistics bold upper T equals left-parenthesis upper T 1 comma ellipsis comma upper T Subscript p Baseline right-parenthesis for the parameter vector of intercepts and slopes, bold-italic beta, are

upper T Subscript j Baseline equals sigma-summation Underscript i equals 1 Overscript n Endscripts y Subscript i Baseline x Subscript i j Baseline comma j equals 1 comma ellipsis comma p

Denote a vector of observable sufficient statistics as bold t equals left-parenthesis t 1 comma ellipsis comma t Subscript p Baseline right-parenthesis prime.

The probability density function (PDF) for bold upper T can be created by summing over all binary sequences bold y that generate an observable bold t and letting upper C left-parenthesis bold t right-parenthesis equals StartAbsoluteValue EndAbsoluteValue StartSet bold y colon bold y prime bold upper X equals bold t Superscript prime Baseline EndSet StartAbsoluteValue EndAbsoluteValue denote the number of sequences bold y that generate bold t

probability left-parenthesis bold upper T equals bold t right-parenthesis equals StartFraction upper C left-parenthesis bold t right-parenthesis exp left-parenthesis bold t prime bold-italic beta right-parenthesis Over product Underscript i equals 1 Overscript n Endscripts left-bracket 1 plus exp left-parenthesis bold x prime Subscript i Baseline bold-italic beta right-parenthesis right-bracket EndFraction

In order to condition out the nuisance parameters, partition the parameter vector bold-italic beta equals left-parenthesis bold-italic beta prime Subscript normal upper N Baseline comma bold-italic beta prime Subscript normal upper I right-parenthesis prime, where bold-italic beta Subscript normal upper N is a p Subscript normal upper N Baseline times 1 vector of the nuisance parameters, and bold-italic beta Subscript normal upper I is the parameter vector for the remaining p Subscript normal upper I Baseline equals p minus p Subscript normal upper N parameters of interest. Likewise, partition bold upper X into bold upper X Subscript normal upper N and bold upper X Subscript normal upper I, bold upper T into bold upper T Subscript normal upper N and bold upper T Subscript normal upper I, and bold t into bold t Subscript normal upper N and bold t Subscript normal upper I. The nuisance parameters can be removed from the analysis by conditioning on their sufficient statistics to create the conditional likelihood of bold upper T Subscript normal upper I given bold upper T Subscript normal upper N Baseline equals bold t Subscript normal upper N,

StartLayout 1st Row 1st Column Blank 2nd Column Blank 3rd Column probability left-parenthesis bold upper T Subscript normal upper I Baseline equals bold t Subscript normal upper I Baseline vertical-bar bold upper T Subscript normal upper N Baseline equals bold t Subscript normal upper N Baseline right-parenthesis equals StartFraction probability left-parenthesis bold upper T equals bold t right-parenthesis Over probability left-parenthesis bold upper T Subscript normal upper N Baseline equals bold t Subscript normal upper N Baseline right-parenthesis EndFraction 2nd Row 1st Column Blank 2nd Column Blank 3rd Column equals f Subscript bold-italic beta Sub Subscript normal upper I Baseline left-parenthesis bold t Subscript normal upper I Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis equals StartFraction upper C left-parenthesis bold t Subscript normal upper N Baseline comma bold t Subscript normal upper I Baseline right-parenthesis exp left-parenthesis bold t prime Subscript normal upper I Baseline bold-italic beta Subscript normal upper I Baseline right-parenthesis Over sigma-summation Underscript u Endscripts upper C left-parenthesis bold t Subscript normal upper N Baseline comma bold u right-parenthesis exp left-parenthesis bold u prime bold-italic beta Subscript normal upper I Baseline right-parenthesis EndFraction EndLayout

where upper C left-parenthesis bold t Subscript normal upper N Baseline comma bold u right-parenthesis is the number of vectors bold y such that bold y prime bold upper X Subscript normal upper N Baseline equals bold t Subscript normal upper N and bold y prime bold upper X Subscript normal upper I Baseline equals bold u. Note that the nuisance parameters have factored out of this equation, and that upper C left-parenthesis bold t Subscript normal upper N Baseline comma bold t Subscript normal upper I Baseline right-parenthesis is a constant.

The goal of the exact conditional analysis is to determine how likely the observed response bold y 0 is with respect to all 2 Superscript n possible responses bold y equals left-parenthesis y 1 comma ellipsis comma y Subscript n Baseline right-parenthesis prime. One way to proceed is to generate every bold y vector for which bold y prime bold upper X Subscript normal upper N Baseline equals bold t Subscript normal upper N, and count the number of vectors bold y for which bold y prime bold upper X Subscript normal upper I is equal to each unique bold t Subscript normal upper I. Generating the conditional distribution from complete enumeration of the joint distribution is conceptually simple; however, this method becomes computationally infeasible very quickly. For example, if you had only 30 observations, you would have to scan through 2 Superscript 30 different bold y vectors.

Several algorithms are available in PROC LOGISTIC to generate the exact distribution. All of the algorithms are based on the following observation. Given any bold y equals left-parenthesis y 1 comma ellipsis comma y Subscript n Baseline right-parenthesis prime and a design bold upper X equals left-parenthesis bold x 1 comma ellipsis comma bold x Subscript n Baseline right-parenthesis prime, let bold y Subscript left-parenthesis i right-parenthesis Baseline equals left-parenthesis y 1 comma ellipsis comma y Subscript i Baseline right-parenthesis prime and bold upper X Subscript left-parenthesis i right-parenthesis Baseline equals left-parenthesis bold x 1 comma ellipsis comma bold x Subscript i Baseline right-parenthesis prime be the first i rows of each matrix. Write the sufficient statistic based on these i rows as bold t prime Subscript left-parenthesis i right-parenthesis Baseline equals bold y prime Subscript left-parenthesis i right-parenthesis Baseline bold upper X Subscript left-parenthesis i right-parenthesis. A recursion relation results: bold t Subscript left-parenthesis i plus 1 right-parenthesis Baseline equals bold t Subscript left-parenthesis i right-parenthesis Baseline plus y Subscript i plus 1 Baseline bold x Subscript i plus 1.

The following methods are available:

  • The multivariate shift algorithm developed by Hirji, Mehta, and Patel (1987), which steps through the recursion relation by adding one observation at a time and building an intermediate distribution at each step. If it determines that bold t Subscript left-parenthesis i right-parenthesis for the nuisance parameters could eventually equal bold t, then bold t Subscript left-parenthesis i right-parenthesis is added to the intermediate distribution.

  • An extension of the multivariate shift algorithm to generalized logit models by Hirji (1992). Because the generalized logit model fits a new set of parameters to each logit, the number of parameters in the model can easily get too large for this algorithm to handle. Note for these models that the hypothesis tests for each effect are computed across the logit functions, while individual parameters are estimated for each logit function.

  • A network algorithm described in Mehta, Patel, and Senchaudhuri (1992), which builds a network for each parameter that you are conditioning out in order to identify feasible y Subscript i for the bold y vector. These networks are combined and the set of feasible y Subscript i is further reduced, and then the multivariate shift algorithm uses this knowledge to build the exact distribution without adding as many intermediate bold t Subscript left-parenthesis i plus 1 right-parenthesis as the multivariate shift algorithm does.

  • A hybrid Monte Carlo and network algorithm described by Mehta, Patel, and Senchaudhuri (2000), which extends their 1992 algorithm by sampling from the combined network to build the exact distribution.

  • A Markov chain Monte Carlo algorithm described by Forster, McDonald, and Smith (2003), which generates the exact distribution by repeatedly perturbing the response vector to obtain a new response vector while maintaining the sufficient statistics for the nuisance parameters, and the resulting bold t are added to the exact distribution.

The bulk of the computation time and memory for these algorithms is consumed by the creation of the networks and the exact joint distribution. After the joint distribution for a set of effects is created, the computational effort required to produce hypothesis tests and parameter estimates for any subset of the effects is (relatively) trivial. See the section Computational Resources for Exact Logistic Regression for more computational notes about exact analyses.

Note: An alternative to using these exact conditional methods is to perform Firth’s bias-reducing penalized likelihood method (see the FIRTH option in the MODEL statement); this method has the advantage of being much faster and less memory intensive than exact algorithms, but it might not converge to a solution.

Hypothesis Tests

Consider testing the null hypothesis upper H 0 colon bold-italic beta Subscript normal upper I Baseline equals bold 0 against the alternative upper H Subscript upper A Baseline colon bold-italic beta Subscript normal upper I Baseline not-equals bold 0, conditional on bold upper T Subscript normal upper N Baseline equals bold t Subscript normal upper N. Under the null hypothesis, the test statistic for the exact probability test is just f Subscript bold-italic beta Sub Subscript normal upper I Subscript equals bold 0 Baseline left-parenthesis bold t Subscript normal upper I Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis, while the corresponding p-value is the probability of getting a less likely (more extreme) statistic,

p left-parenthesis bold t Subscript normal upper I Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis equals sigma-summation Underscript u element-of normal upper Omega Subscript p Baseline Endscripts f 0 left-parenthesis bold u vertical-bar bold t Subscript normal upper N Baseline right-parenthesis

where normal upper Omega Subscript p Baseline equals left-brace bold u colon there exist bold y with bold y prime bold upper X Subscript normal upper I Baseline equals bold u, bold y prime bold upper X Subscript normal upper N Baseline equals bold t Subscript normal upper N, and f 0 left-parenthesis bold u vertical-bar bold t Subscript normal upper N Baseline right-parenthesis less-than-or-equal-to f 0 left-parenthesis bold t Subscript normal upper I Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis right-brace.

The exact probability test is not necessarily a sum of tail areas and can be inflated if the distribution is skewed. The more robust exact conditional scores test is a sum of tail areas and is usually preferred to the exact probability test. To compute the exact conditional scores test, the conditional mean bold-italic mu Subscript normal upper I and variance matrix bold upper Sigma Subscript normal upper I of the bold upper T Subscript normal upper I (conditional on bold upper T Subscript normal upper N Baseline equals bold t Subscript normal upper N) are calculated, and the score statistic for the observed value,

s equals left-parenthesis bold t Subscript normal upper I Baseline minus bold-italic mu Subscript normal upper I Baseline right-parenthesis prime bold upper Sigma Subscript normal upper I Superscript negative 1 Baseline left-parenthesis bold t Subscript normal upper I Baseline minus bold-italic mu Subscript normal upper I Baseline right-parenthesis

is compared to the score for each member of the distribution

upper S left-parenthesis bold upper T Subscript normal upper I Baseline right-parenthesis equals left-parenthesis bold upper T Subscript normal upper I Baseline minus bold-italic mu Subscript normal upper I Baseline right-parenthesis prime bold upper Sigma Subscript normal upper I Superscript negative 1 Baseline left-parenthesis bold upper T Subscript normal upper I Baseline minus bold-italic mu Subscript normal upper I Baseline right-parenthesis

The resulting p-value is

p left-parenthesis bold t Subscript normal upper I Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis equals probability left-parenthesis upper S greater-than-or-equal-to s right-parenthesis equals sigma-summation Underscript u element-of normal upper Omega Subscript s Baseline Endscripts f 0 left-parenthesis bold u vertical-bar bold t Subscript normal upper N Baseline right-parenthesis

where normal upper Omega Subscript s Baseline equals left-brace bold u colon there exist bold y with bold y prime bold upper X Subscript normal upper I Baseline equals bold u, bold y prime bold upper X Subscript normal upper N Baseline equals bold t Subscript normal upper N, and upper S left-parenthesis bold u right-parenthesis greater-than-or-equal-to s right-brace.

The mid-p statistic, defined as

p left-parenthesis bold t Subscript normal upper I Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis minus one-half f 0 left-parenthesis bold t Subscript normal upper I Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis

was proposed by Lancaster (1961) to compensate for the discreteness of a distribution. See Agresti (1992) for more information. However, to allow for more flexibility in handling ties, you can write the mid-p statistic as (based on a suggestion by LaMotte (2002) and generalizing Vollset, Hirji, and Afifi (1991))

sigma-summation Underscript u element-of normal upper Omega Subscript less-than Endscripts f 0 left-parenthesis bold u vertical-bar bold t Subscript normal upper N Baseline right-parenthesis plus delta 1 f 0 left-parenthesis bold t Subscript normal upper I Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis plus delta 2 sigma-summation Underscript u element-of normal upper Omega Subscript equals Endscripts f 0 left-parenthesis bold u vertical-bar bold t Subscript normal upper N Baseline right-parenthesis

where, for i element-of StartSet p comma s EndSet, normal upper Omega Subscript less-than is normal upper Omega Subscript i using strict inequalities, and normal upper Omega Subscript equals is normal upper Omega Subscript i using equalities with the added restriction that bold u not-equals bold t Subscript normal upper I. Letting left-parenthesis delta 1 comma delta 2 right-parenthesis equals left-parenthesis 0.5 comma 1.0 right-parenthesis yields Lancaster’s mid-p.

Caution: When the exact distribution has ties and you specify METHOD=NETWORKMC or METHOD=MCMC, the sampling algorithm estimates p left-parenthesis bold t vertical-bar bold t Subscript normal upper N Baseline right-parenthesis with error, and hence it cannot determine precisely which values contribute to the reported p-values. For example, if the exact distribution has densities StartSet 0.2 comma 0.2 comma 0.2 comma 0.4 EndSet and if the observed statistic has probability 0.2, then the exact probability p-value is exactly 0.6. Under Monte Carlo sampling, if the densities after N samples are StartSet 0.18 comma 0.21 comma 0.23 comma 0.38 EndSet and the observed probability is 0.21, then the resulting p-value is 0.39. Therefore, the exact probability test p-value for this example fluctuates between 0.2, 0.4, and 0.6, and the reported p-values are actually lower bounds for the true p-values. If you need more precise values, you can specify the OUTDIST= option, determine appropriate cutoff values for the observed probability and score, and then construct p-value estimates from the OUTDIST= data set and display them in the SAS log by using the following statements:

data _null_;
   set outdist end=end;
   retain pvalueProb 0 pvalueScore 0;
   if prob < ProbCutOff then pvalueProb+prob;
   if score > ScoreCutOff then pvalueScore+prob;
   if end then put pvalueProb= pvalueScore=;
run;

Because the METHOD=MCMC samples are correlated, the covariance that is computed for the exact conditional scores test is biased. Specifying the NTHIN= option might reduce this bias.

Inference for a Single Parameter

Exact parameter estimates are derived for a single parameter beta Subscript i by regarding all the other parameters bold-italic beta Subscript normal upper N Baseline equals left-parenthesis beta 1 comma ellipsis comma beta Subscript i minus 1 Baseline comma beta Subscript i plus 1 Baseline comma ellipsis comma beta Subscript p Sub Subscript normal upper N Subscript plus p Sub Subscript normal upper I Subscript Baseline right-parenthesis prime as nuisance parameters. The appropriate sufficient statistics are bold upper T Subscript normal upper I Baseline equals upper T Subscript i and bold upper T Subscript normal upper N Baseline equals left-parenthesis upper T 1 comma ellipsis comma upper T Subscript i minus 1 Baseline comma upper T Subscript i plus 1 Baseline comma ellipsis comma upper T Subscript p Sub Subscript normal upper N Subscript plus p Sub Subscript normal upper I Subscript Baseline right-parenthesis prime, with their observed values denoted by the lowercase t. Hence, the conditional PDF used to create the parameter estimate for beta Subscript i is

f Subscript beta Sub Subscript i Baseline left-parenthesis t Subscript i Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis equals StartFraction upper C left-parenthesis bold t Subscript normal upper N Baseline comma t Subscript i Baseline right-parenthesis exp left-parenthesis t Subscript i Baseline beta Subscript i Baseline right-parenthesis Over sigma-summation Underscript u element-of normal upper Omega Endscripts upper C left-parenthesis bold t Subscript normal upper N Baseline comma u right-parenthesis exp left-parenthesis u beta Subscript i Baseline right-parenthesis EndFraction

for normal upper Omega equals StartSet u colon there exist bold y with upper T Subscript i Baseline equals u and bold upper T Subscript normal upper N Baseline equals bold t Subscript normal upper N Baseline EndSet.

The maximum exact conditional likelihood estimate is the quantity ModifyingAbove beta With caret Subscript i, which maximizes the conditional PDF. A Newton-Raphson algorithm is used to perform this search. However, if the observed t Subscript i attains either its maximum or minimum value in the exact distribution (that is, either t Subscript i Baseline equals min left-brace u colon u element-of normal upper Omega right-brace or t Subscript i Baseline equals max left-brace u colon u element-of normal upper Omega right-brace), then the conditional PDF is monotonically increasing in beta Subscript i and cannot be maximized. In this case, a median unbiased estimate (Hirji, Tsiatis, and Mehta 1989) ModifyingAbove beta With caret Subscript i is produced that satisfies f Subscript ModifyingAbove beta With caret Sub Subscript i Baseline left-parenthesis t Subscript i Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis equals 0.5, and a Newton-Raphson algorithm is used to perform the search.

The standard error of the exact conditional likelihood estimate is just the negative of the inverse of the second derivative of the exact conditional log likelihood (Agresti 2002).

Likelihood ratio tests based on the conditional PDF are used to test the null upper H 0 colon beta Subscript i Baseline equals 0 against the alternative upper H Subscript upper A Baseline colon beta Subscript i Baseline greater-than 0. The critical region for this UMP test consists of the upper tail of values for upper T Subscript i in the exact distribution. Thus, the p-value p Subscript plus Baseline left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis for a one-tailed test is

p Subscript plus Baseline left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis equals sigma-summation Underscript u greater-than-or-equal-to t Subscript i Baseline Endscripts f 0 left-parenthesis u vertical-bar bold t Subscript normal upper N Baseline right-parenthesis

Similarly, the p-value p Subscript minus Baseline left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis for the one-tailed test of upper H 0 against upper H Subscript upper A Baseline colon beta Subscript i Baseline less-than 0 is

p Subscript minus Baseline left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis equals sigma-summation Underscript u less-than-or-equal-to t Subscript i Baseline Endscripts f 0 left-parenthesis u vertical-bar bold t Subscript normal upper N Baseline right-parenthesis

The p-value p left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis for a two-tailed test of upper H 0 against upper H Subscript upper A Baseline colon beta Subscript i Baseline not-equals 0 is

p left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis equals 2 min left-bracket p Subscript minus Baseline left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis comma p Subscript plus Baseline left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis right-bracket

By default, the p-value that is reported for a single parameter in the "Exact Parameter Estimates" table is for the two-tailed test. For median unbiased estimates, the p-value for a one-tailed test is always reported.

An upper 100 left-parenthesis 1 minus 2 epsilon right-parenthesis% exact confidence limit for ModifyingAbove beta With caret Subscript i corresponding to the observed t Subscript i is the solution beta Subscript upper U Baseline left-parenthesis t Subscript i Baseline right-parenthesis of epsilon equals p Subscript minus Baseline left-parenthesis t Subscript i Baseline comma beta Subscript upper U Baseline left-parenthesis t Subscript i Baseline right-parenthesis right-parenthesis, while the lower exact confidence limit is the solution beta Subscript upper L Baseline left-parenthesis t Subscript i Baseline right-parenthesis of epsilon equals p Subscript plus Baseline left-parenthesis t Subscript i Baseline comma beta Subscript upper L Baseline left-parenthesis t Subscript i Baseline right-parenthesis right-parenthesis. Again, a Newton-Raphson procedure is used to search for the solutions. Note that one of the confidence limits for a median unbiased estimate is set to infinity and the other is computed at 2 epsilon, which results in the display of a one-sided 100 left-parenthesis 1 minus 2 epsilon right-parenthesis% confidence interval.

Specifying the ONESIDED option displays only one p-value and one confidence interval, because small values of p Subscript plus Baseline left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis and p Subscript minus Baseline left-parenthesis t Subscript i Baseline semicolon 0 right-parenthesis support different alternative hypotheses and only one of these p-values can be less than 0.50.

The mid-p confidence limits are the solutions to min left-brace p Subscript minus Baseline left-parenthesis t Subscript i Baseline comma beta left-parenthesis t Subscript i Baseline right-parenthesis right-parenthesis comma p Subscript plus Baseline left-parenthesis t Subscript i Baseline comma beta left-parenthesis t Subscript i Baseline right-parenthesis right-parenthesis right-brace minus left-parenthesis 1 minus delta 1 right-parenthesis f Subscript beta left-parenthesis t Sub Subscript i Subscript right-parenthesis Baseline left-parenthesis t Subscript i Baseline vertical-bar bold t Subscript normal upper N Baseline right-parenthesis equals epsilon for epsilon equals alpha slash 2 comma 1 minus alpha slash 2 (Vollset, Hirji, and Afifi 1991). delta 1 equals 1 produces the usual exact (or max-p) confidence interval, delta 1 equals 0.5 yields the mid-p interval, and delta 1 equals 0 gives the min-p interval. The mean of the endpoints of the max-p and min-p intervals provides the mean-p interval as defined by Hirji, Mehta, and Patel (1988).

Estimates and confidence intervals for the odds ratios are produced by exponentiating the estimates and interval endpoints for the parameters.

Last updated: December 09, 2022