The GENMOD Procedure

Zero-Inflated Models

Count data that have an incidence of zeros greater than expected for the underlying probability distribution of counts can be modeled with a zero-inflated distribution. In GENMOD, the underlying distribution can be either Poisson or negative binomial. See Lambert (1992), Long (1997) and Cameron and Trivedi (1998) for more information about zero-inflated models. The population is considered to consist of two types of individuals. The first type gives Poisson or negative binomial distributed counts, which might contain zeros. The second type always gives a zero count. Let lamda be the underlying distribution mean and omega be the probability of an individual being of the second type. The parameter omega is called here the zero-inflation probability, and is the probability of zero counts in excess of the frequency predicted by the underlying distribution. You can request that the zero inflation probability be displayed in an output data set with the PZERO keyword. The probability distribution of a zero-inflated Poisson random variable Y is given by

normal upper P normal r left-parenthesis upper Y equals y right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column omega plus left-parenthesis 1 minus omega right-parenthesis normal e Superscript negative lamda Baseline 2nd Column for y equals 0 2nd Row 1st Column left-parenthesis 1 minus omega right-parenthesis StartFraction lamda Superscript y Baseline normal e Superscript negative lamda Baseline Over y factorial EndFraction 2nd Column for y equals 1 comma 2 comma ellipsis EndLayout

and the probability distribution of a zero-inflated negative binomial random variable Y is given by

normal upper P normal r left-parenthesis upper Y equals y right-parenthesis equals StartLayout Enlarged left-brace 1st Row 1st Column omega plus left-parenthesis 1 minus omega right-parenthesis left-parenthesis 1 plus k lamda right-parenthesis Superscript minus StartFraction 1 Over k EndFraction Baseline 2nd Column for y equals 0 2nd Row 1st Column left-parenthesis 1 minus omega right-parenthesis StartFraction normal upper Gamma left-parenthesis y plus 1 slash k right-parenthesis Over normal upper Gamma left-parenthesis y plus 1 right-parenthesis normal upper Gamma left-parenthesis 1 slash k right-parenthesis EndFraction StartFraction left-parenthesis k lamda right-parenthesis Superscript y Baseline Over left-parenthesis 1 plus k lamda right-parenthesis Superscript y plus 1 slash k Baseline EndFraction 2nd Column for y equals 1 comma 2 comma ellipsis EndLayout

where k is the negative binomial dispersion parameter.

You can model the parameters omega and lamda in GENMOD with the regression models:

StartLayout 1st Row 1st Column h left-parenthesis omega Subscript i Baseline right-parenthesis 2nd Column equals 3rd Column bold z prime Subscript i Baseline bold-italic gamma 2nd Row 1st Column g left-parenthesis lamda Subscript i Baseline right-parenthesis 2nd Column equals 3rd Column bold x prime Subscript i Baseline bold-italic beta EndLayout

where h is one of the binary link functions: logit, probit, or complementary log-log. The link function h is the logit link by default, or the link function option specified in the ZEROMODEL statement. The link function g is the log link function by default, or the link function specified in the MODEL statement, for both the Poisson and the negative binomial. The covariates bold z Subscript i for observation i are determined by the model specified in the ZEROMODEL statement, and the covariates bold x Subscript i are determined by the model specified in the MODEL statement. The regression parameters bold-italic gamma and bold-italic beta are estimated by maximum likelihood.

The mean and variance of Y for the zero-inflated Poisson are given by

StartLayout 1st Row 1st Column normal upper E left-parenthesis upper Y right-parenthesis 2nd Column equals 3rd Column mu equals left-parenthesis 1 minus omega right-parenthesis lamda 2nd Row 1st Column normal upper V normal a normal r left-parenthesis upper Y right-parenthesis 2nd Column equals 3rd Column mu plus StartFraction omega Over 1 minus omega EndFraction mu squared EndLayout

and for the zero-inflated negative binomial by

StartLayout 1st Row 1st Column normal upper E left-parenthesis upper Y right-parenthesis 2nd Column equals 3rd Column mu equals left-parenthesis 1 minus omega right-parenthesis lamda 2nd Row 1st Column normal upper V normal a normal r left-parenthesis upper Y right-parenthesis 2nd Column equals 3rd Column mu plus left-parenthesis StartFraction omega Over 1 minus omega EndFraction plus StartFraction k Over 1 minus omega EndFraction right-parenthesis mu squared EndLayout

You can request that the mean of Y be displayed for each observation in an output data set with the PRED keyword.

Last updated: December 09, 2022