Probabilistic Generative Models for Inference on Complex Systems

DSpace Repository


Dokumentart: PhDThesis
Date: 2024-02-29
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Informatik
Advisor: De Bacco, Caterina (Dr.)
Day of Oral Examination: 2024-01-30
DDC Classifikation: 004 - Data processing and computer science
Keywords: Statistik
Other Keywords:
probabilistic modeling
statistical inference
network science
community detection
conditional independence
Show full item record


Network models are powerful and flexible tools to represent the complex interactions between individual elements in diverse domains. They offer scientists and practitioners willing to exploit the growing abundance of networked datasets meaningful insights into the fundamental patterns underlying such interactions. A popular approach to identify these hidden structures is that of generative models, in particular latent variable models: probabilistic models that introduce latent variables to incorporate domain knowledge, capture complex interactions, and uncover statistically meaningful network structures. Existing methods are frequently insufficient to capture the complexity of real-world data, and they often do not provide a general framework to fully leverage the additional information carried within the data, such as edges and nodes metadata or higher-order interactions. In this thesis, we present principled and efficient approaches that aim to broaden the range of techniques available for modelling complex networks. Specifically, we work in three principal directions: i) developing flexible methods to perform inference on attributed multilayer networks, ii) exploring innovative theoretical perspectives for incorporating reciprocity and loosening the assumption of conditional independence in network models, iii) designing foundational models to characterize the structural organization of higher-order data. We first extend standard generative models for the analysis of multilayer networks to integrate node metadata into the inference process with the network topology. In addition to applying these methods to already explored real-world data, such as social and biological networks, we introduce this methodology to another field for the first time, that is patent citation networks. We show how incorporating additional information not only boosts performance, but also leads to more interpretable structures. Next, we propose approaches to handle the pairwise dependencies between two directed edges connecting node pairs, which come with the relaxation of the assumption that edges are independent of each other. We demonstrate the flexibility and relevancy of our mathematical frameworks in various contexts, such as the analysis of dynamic networks, identification of anomalies, and estimation of unobserved network structures using multiple reports. By explicitly accounting for reciprocity, it improves edge prediction and network reconstruction, while also shedding light on the underlying mechanisms driving edge formation. Finally, we present principled methods to define and identify the mesoscale organization of higher-order data. We evaluate their effectiveness on a variety of small- and large-scale real-world systems. Notably, these models display good performance in effectively retrieving both robust and flexible community structures, while reliably predicting higher-order interactions of arbitrary size. As an additional contribution, we present a newly developed python library specifically designed for analyzing data with higher-order interactions. This work thus introduces cutting-edge techniques that go beyond what has been previously established in the field of network inference and contribute to the enhancement of the current literature. These developed approaches account for the additional complexity present in real-world systems, enabling a more profound understanding of data across a range of different disciplines.

This item appears in the following Collection(s)