Approximate Bayesian Computation for inference in models for largescale network data
Antonietta Mira
InterDisciplinary Institute of Data Science Università della Svizzera italiana, Lugano, Switzerland
and
Università degli Studi dell’Insubria, Como, Italy
COAUTHORS:
JukkaPekka Onnela (Department of Biostatistics, Harvard
University, US) and
Ritabrata Dutta (Università della Svizzera italiana,
Lugano, Switzerland)
Many systems of scientific interest can be investigated as networks, where network nodes correspond to the elements of the system and network edges to interactions between the elements. Increasing availability of largescale data and steady improvements in computational capacity are continuing to fuel the growth of this field. Network models are now commonly used to investigate social, economical and biological complexity at the systemic level. There is a general divide between the two prominent paradigms to the modeling of networks, which are the approach of mechanistic networks models and the approach of statistical network models. Mechanistic network models are knowledge domain driven and assume that the microscopic mechanisms governing network formation and evolution at the level of individual nodes are known, and questions often focus on understanding macroscopic features that emerge from repeated application of these known mechanisms. The statistical approach, in contrast, often starts from observed network structures and attempts to infer some aspects about the underlying data generating process. Mechanistic network models provide insight into how the network is formed and how it evolves at the level of individual nodes, but as mechanistic rules typically lead to complex network structures, it is difficult to assign a probability to any given network realizations that a mechanistic model may generate. Because of this difficulty, there is typically no closed form expression for likelihood for these models and, consequently, both likelihood and posterior based inference for learning from data is not possible. We have developed a principled statistical framework, based on Approximate Bayesian Computation, to bring some of the mechanistic network models into the realm of statistical inference both for parameter estimation, construction of confidence/credible intervals, hypothesis testing and model selection. This approach is feasible because, given a set of parameter values, it is easy to sample network configurations from most mechanistic models. I will introduce the general Approximate Bayesian Computation framework and demonstrate its application to largescale mechanistic networks, where it can be used to infer model parameters, and their associated uncertainties, from empirical data. Examples will focus on applications to social and biological networks.
