Statistical Models for Count Data with Excess Zeros: A Review
KyungMann Kim
University of Wisconsin-Madison, USA
Count data are routinely analyzed using Poisson (P) distributions.
Due to population heterogeneity, however, they often exhibit
over-dispersion known as the extra-Poisson variation.
This extra-Poisson variation can be handled in one of two ways,
maximum quasi-likelihood method or a latent variable model leading to
negative binomial (NB) distribution with a gamma mixing distribution for the
Poisson mean. Still there are
situations where these models perform poorly because of excess zeros in the
count. There are two similar, but
conceptually different approaches to handling excess zeros.
In what is commonly known as zero-inflated (ZI) models, we may view
the data as being generated from a mixture model with a point mass at zero
representing “excess” zeros and a standard non-degenerate distribution
including “true” zeros. This mixture
model allows for mixture of two different populations, one non-susceptible
for events (resulting in excess zeros) and the other susceptible (including
true zeros). In contrast, the
so-called hurdle (H) models may be conceptualized as having zeros only from
a non-susceptible population and can be modeled using two processes, one
generating zeros (“choice”) and the other generating only the positive
counts (“intensity”) from a truncated count distribution.
In this presentation, I will show examples of count data with excess
zeros from the literature in various disciplines and applications and review
recently developed marginal mean models for count data with excess zeros for
illustration.
|