Statistical Models for Count Data with Excess Zeroes: A Review
KyungMann Kim
University of WisconsinMadison, USA
Count data are routinely analyzed using Poisson (P)
distributions. Due to population heterogeneity, however, they often exhibit
overdispersion known as the extraPoisson variation. This extraPoisson
variation can be handled in one of two ways, maximum quasilikelihood method
or a latent variable model leading to negative binomial (NB) distribution
with a gamma mixing distribution for the Poisson mean. Still there are
situations where these models perform poorly because of excess zeroes in the
count. There are two similar, but conceptually different approaches to
handling excess zeroes. In what is commonly known as zeroinflated (ZI)
models, we may view data as being generated from a mixture model with a
point mass at zero representing “excess” zeroes and a standard
nondegenerate distribution including “true” zeroes. This mixture model
allows for mixture of two different populations, one nonsusceptible for
events (resulting in excess zeroes) and the other susceptible (including
true zeroes). In contrast, the socalled hurdle (H) models may be
conceptualized as having zeroes only from a nonsusceptible population and
can be modeled using two processes, one generating zeroes (“choice”) and the
other generating only the positive counts (“intensity”) from a truncated
count distribution. In this presentation, I will review count data
regression models with emphasis on zeroinflated count data along with
illustration of these models with examples from the literature.
