Every Missing not at Random Model for Incomplete Data Has Got a Missing
at Random Counterpart with Equal Fit
Geert Molenberghs1,2, Michael G. Kenward3,
Geert Verbeke2,1, Caroline Beunckens1, and Cristina Sotto1
1Interuniversity Institute for Biostatistics and statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
2Interuniversity Institute for Biostatistics and statistical Bioinformatics,
Katholieke Universiteit Leuven, Belgium
3Medical
Statistics Unit, London School of Hygiene and Tropical Medicine, UK
Over the last decade, a variety of models to analyze incomplete multivariate
and longitudinal data have been proposed, many of which allowing for the
missingness to be not at random (MNAR), in the sense that the unobserved
measurements influence the process governing missingness, in addition to
influences coming from observed measurements and/or covariates. The
fundamental problems implied by such models, to which we refer as
sensitivity to unverifiable modeling assumptions, has, in turn, sparked off
various strands of research in what is now termed sensitivity analysis. The
nature of sensitivity originates from the fact that an MNAR model is not
fully verifiable from the data, rendering the empirical distinction between
MNAR and random missingness (MAR), where only covariates and observed
outcomes influence missingness, hard or even impossible, unless one is
prepared to accept the posited MNAR model in an unquestioning way. We show
that the empirical distinction between MAR and MNAR is not possible, in the
sense that each MNAR model fit to a set of observed data can be reproduced
exactly by an MAR counterpart. Of course, such a pair of models will produce
different predictions of the unobserved outcomes, given the observed ones.
This is true for any model, whether formulated in a selection model (SeM),
pattern-mixture model (PMM), or shared-parameter model (SPM) format.
Specific attention will also be given to the SPM case, since we are able to
provide a formal definition of MAR in this case.
Theoretical considerations are supplemented with illustrations based on a
clinical trial in onychomycosis and on the Slovenian Public Opinion survey.
The implications for sensitivity analysis are discussed.
Missing data can be seen as latent variables. Such a view allows extension
of our results to other forms of coarsening, such as grouping and censoring.
In addition, the technology applies to random effects models, where a
parametric form for the random effects can be replaced by certain other
parametric (and non-parametric) form, without distorting the model’s fit,
latent classes, latent variables, etc.
References
Creemers, A., Hens, N., Aerts, M., Molenberghs, G., Verbeke, G., and
Kenward, M.G. (2008).
Shared-parameter models and missingness at random. Submitted for
publication.
Molenberghs, G., Beunckens, C., Sotto, C., and Kenward, M.G. (2008) Every
missing not at random model has got a missing at random counterpart with
equal fit. Journal of the Royal Statistical Society, Series B, 70, 371-388.
Verbeke, G. and Molenberghs, G. (2009)
Arbitrariness of models for augmented and coarse data, with emphasis on
incomplete-data and random-effects models. Statistical Modelling,
9, 000-000.
|