The GENMOD Procedure

Multinomial Models

This type of model applies to cases where an observation can fall into one of k categories. Binary data occur in the special case where k = 2. If there are m Subscript i observations in a subpopulation i, then the probability distribution of the number falling into the k categories bold y Subscript i Baseline equals left-parenthesis y Subscript i Baseline 1 Baseline comma y Subscript i Baseline 2 Baseline comma ellipsis comma y Subscript i k Baseline right-parenthesis can be modeled by the multinomial distribution, defined in the section Response Probability Distributions, with sigma-summation Underscript j Endscripts y Subscript i j Baseline equals m Subscript i. The multinomial model is an ordinal model if the categories have a natural order.

Residuals are not available in the OBSTATS table or the output data set for multinomial models.

By default, and consistently with binomial models, the GENMOD procedure orders the response categories for ordinal multinomial models from lowest to highest and models the probabilities of the lower response levels. You can change the way PROC GENMOD orders the response levels with the RORDER= option in the PROC GENMOD statement. The order that PROC GENMOD uses is shown in the "Response Profiles" output table described in the section Response Profile.

The GENMOD procedure supports only the ordinal multinomial model. If left-parenthesis p Subscript i Baseline 1 Baseline comma p Subscript i Baseline 2 Baseline comma ellipsis comma p Subscript i k Baseline right-parenthesis are the category probabilities, the cumulative category probabilities are modeled with the same link functions used for binomial data. Let upper P Subscript i r Baseline equals sigma-summation Underscript j equals 1 Overscript r Endscripts p Subscript i j, r equals 1 comma 2 comma ellipsis comma k minus 1, be the cumulative category probabilities (note that upper P Subscript i k Baseline equals 1). The ordinal model is

g left-parenthesis upper P Subscript i r Baseline right-parenthesis equals mu Subscript r Baseline plus bold x prime bold-italic beta for r equals 1 comma 2 comma ellipsis comma k minus 1

where mu 1 comma mu 2 comma ellipsis comma mu Subscript k minus 1 Baseline are intercept terms that depend only on the categories and bold x Subscript i is a vector of covariates that does not include an intercept term. The logit, probit, and complementary log-log link functions g are available. These are obtained by specifying the MODEL statement options DIST=MULTINOMIAL and LINK=CUMLOGIT (cumulative logit), LINK=CUMPROBIT (cumulative probit), or LINK=CUMCLL (cumulative complementary log-log). Alternatively,

upper P Subscript i r Baseline equals normal upper F left-parenthesis mu Subscript r Baseline plus bold x prime bold-italic beta right-parenthesis for r equals 1 comma 2 comma ellipsis comma k minus 1

where normal upper F equals g Superscript negative 1 is a cumulative distribution function for the logistic, normal, or extreme-value distribution.

PROC GENMOD estimates the intercept parameters mu 1 comma mu 2 comma ellipsis comma mu Subscript k minus 1 Baseline and regression parameters bold-italic beta by maximum likelihood.

The subpopulations i are defined by constant values of the AGGREGATE= variable. This has no effect on the parameter estimates, but it does affect the deviance and Pearson chi-square statistics; it also affects parameter estimate standard errors if you specify the SCALE=DEVIANCE or SCALE=PEARSON option.

Last updated: December 09, 2022