Statistics of compositional data
Gerald van den Boogaart
Technical University Freiberg, Germany
The talk gives a short introduction into the recent approach to the
statistical analysis of compositional data. Data providing the amounts of
different components forming a total is call compositional if the total
amount is irrelevant for statistical question under consideration. This
might be amounts of different elements in minerals, the amounts of different
cell types in a blood samples, the relative amounts of different beetle
species in ecological systems, or the money spend on different types of
expenses (workforce, tax, operation costs, raw products) in companies.
Almost never all relevant components are reported.
Seen from a classical view of multivariate statistics this type of data
has a lot of strange properties: it can't be normally distributed because
the domain is bounded to a triangle like region, variances matrices are
always singular, scaled vectors correspond to the same composition, a single
measurement error or missing value changes all other values, relative errors
are more relevant than absolute differences, different units of measure
(e.g. mass %, vol %) or different unobserved components can lead to
different order relations and directions of dependence among the unaffected
components, data is practically always heavily skewed.
The talk will introduce you to a solution to that problem: The principle
of working in coordinates. This principle allows it to translate
compositional problems into classical multivariate statistical tasks, to do
a consistent analysis with well known methods and to translate the results
back into a compositional context. It will show this principle at work for
some classical methods like distributions, linear models, tests, principle
component analysis and outlier detection. And it we show how new a
specialized methodology can be built on that. The aim is to show how simple
it can be to analyze compositional data in a consistent way avoiding all the
paradoxes and artefacts mentioned above, when we just follow some basic
rules.
|