Symbolic Data Analysis: Are Distributions the Numbers of the Future? An
Illustrative Answer
Lynne Billard
Department of Statistics, University of Georgia, Athens, GA 30602
Massively large data sets are routine and ubiquitous
given modern computer capabilities. What is not so routine is how to analyse
these data. One approach is to aggregate the data sets according to some
scientic criteria. The resultant data are perforce symbolic data, i.e.,
lists, intervals, histograms, and so on. Applications abound, especially in
the medical and social sciences. Other data sets (small or large in size)
are naturally symbolic valued, such as species data, data with measurement
uncertainties, confidential data, and the like. Unlike classical data which
are points in p-dimensional space, symbolic data are hypercubes or
Cartesian products of distributions in p-dimensional space. We describe such
data and how they arise. We look brifley at some of the differences between
classical and symbolic data and their respective methodologies, through
illustrations.
|