Statistical Models for Count Data with Excess Zeros: A Review
KyungMann Kim
University of WisconsinMadison, USA
Count data are routinely analyzed using Poisson (P) distributions.
Due to population heterogeneity, however, they often exhibit
overdispersion known as the extraPoisson variation.
This extraPoisson variation can be handled in one of two ways,
maximum quasilikelihood method or a latent variable model leading to
negative binomial (NB) distribution with a gamma mixing distribution for the
Poisson mean. Still there are
situations where these models perform poorly because of excess zeros in the
count. There are two similar, but
conceptually different approaches to handling excess zeros.
In what is commonly known as zeroinflated (ZI) models, we may view
the data as being generated from a mixture model with a point mass at zero
representing “excess” zeros and a standard nondegenerate distribution
including “true” zeros. This mixture
model allows for mixture of two different populations, one nonsusceptible
for events (resulting in excess zeros) and the other susceptible (including
true zeros). In contrast, the
socalled hurdle (H) models may be conceptualized as having zeros only from
a nonsusceptible population and can be modeled using two processes, one
generating zeros (“choice”) and the other generating only the positive
counts (“intensity”) from a truncated count distribution.
In this presentation, I will show examples of count data with excess
zeros from the literature in various disciplines and applications and review
recently developed marginal mean models for count data with excess zeros for
illustration.
