Biomedical Machine Learning Beyond the Training Distribution

DSpace Repositorium (Manakin basiert)

Zur Kurzanzeige

dc.contributor.advisor Schölkopf, Bernhard (Prof. Dr.)
dc.contributor.author Visonà, Giovanni
dc.date.accessioned 2025-12-03T15:30:26Z
dc.date.available 2025-12-03T15:30:26Z
dc.date.issued 2025-12-03
dc.identifier.uri http://hdl.handle.net/10900/172807
dc.identifier.uri http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1728078 de_DE
dc.identifier.uri http://dx.doi.org/10.15496/publikation-114132
dc.description.abstract Machine learning (ML) holds the potential to impact many aspects of our lives, particularly in high-stakes areas like law, autonomous systems, and healthcare. The prospects of leveraging large quantities of data to mine patterns, improve decision-making, and navigate the complexity of biological systems are especially appealing and can have far-ranging consequences; however, ensuring the robustness and reliability of machine learning models has proven a remarkably difficult challenge, leading to considerable efforts by the research community. In particular, understanding how ML models generalize to new observations is a necessary condition for the fruitful translation of these advancements in machine learning to clinical practice or to expand biological domain knowledge. When the training and test settings correspond, and the individual observations do not affect each other---the so-called independent, identically distributed (IID) setting---machine learning and deep learning have displayed remarkable capabilities. But when the data-generating distribution shifts, or when we want to solve related but slightly different tasks, then the quality of the predictions of a model can rapidly deteriorate. In this thesis, I will examine the challenges that arise when generalizing beyond the training distribution in biomedical machine learning and the approaches developed to tackle such challenges. The first part of the thesis will provide a broad overview of the topic of generalization in machine learning, starting from a conceptual formulation of the generalization problem and the progress made in laying theoretical foundations for generalization in ML. Delving into the topic, I will provide an examination of the most common paradigms developed to improve predictive performance when generalizing outside the training distribution, and I will discuss the role of causal reasoning within this picture. Afterwards, I will review the state of biomedical applications of machine learning, highlighting some of the most well-studied areas of research, as well as fields where the use of ML has yet to deliver on its promise. Of particular interest is the topic of biases in biomedical data: given the staggering complexity of biological phenomena, and the considerable experimental constraints on gathering relevant data, it is crucial that we understand how to separate noise and natural variability from meaningful signal. Related to this idea, I will also discuss the ever-present challenge of validating the results of biomedical ML models. Following these broad overviews of generalization and biomedical machine learning, I will present two works revolving around the application of deep learning to biological and clinical data. In each of them, the generalization challenges and paradigms presented in the earlier chapters play a crucial role, enabling novel prediction tasks or revealing insights into the properties of the models. The first work, that focuses on the task of imputing epigenomic signals, showcases how the use of transfer learning enables the out-of-distribution imputation of individual-specific epigenomic patterns, a case study in personalized epigenomics that is, to the best of my knowledge, the first of its kind. Afterwards, I will present a research project that tackles the task of predicting antimicrobial resistance from clinical proteomics data; when delving into the workings of the models proposed, the analysis of zero-shot prediction tasks offers a window into their robustness, which can guide future developments and offer insights for the data collection efforts required to progress further. en
dc.language.iso en de_DE
dc.publisher Universität Tübingen de_DE
dc.rights ubt-podno de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en en
dc.subject.classification Maschinelles Lernen , Biomedizin , Generalisierung , Epigenetik de_DE
dc.subject.ddc 004 de_DE
dc.subject.other Maschinelles Lernen de_DE
dc.subject.other Biomedizin de_DE
dc.subject.other Generalisierung de_DE
dc.subject.other Epigenetik de_DE
dc.subject.other Antibiotikaresistenz de_DE
dc.subject.other Antimicrobial Resistance en
dc.subject.other Epigenetics en
dc.subject.other Biomedicine en
dc.subject.other Machine Learning en
dc.subject.other Generalization en
dc.title Biomedical Machine Learning Beyond the Training Distribution en
dc.type PhDThesis de_DE
dcterms.dateAccepted 2025-10-29
utue.publikation.fachbereich Informatik de_DE
utue.publikation.fakultaet 7 Mathematisch-Naturwissenschaftliche Fakultät de_DE
utue.publikation.noppn yes de_DE

Dateien:

Das Dokument erscheint in:

Zur Kurzanzeige