Towards Robust Machine Learning: Benchmarking and Adaptation in Challenging Settings

DSpace Repositorium (Manakin basiert)

Zur Kurzanzeige

dc.contributor.advisor Bethge, Matthias (Prof. Dr.)
dc.contributor.author Press, Ori
dc.date.accessioned 2025-09-15T09:12:29Z
dc.date.available 2025-09-15T09:12:29Z
dc.date.issued 2025-09-15
dc.identifier.uri http://hdl.handle.net/10900/170245
dc.identifier.uri http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1702459 de_DE
dc.identifier.uri http://dx.doi.org/10.15496/publikation-111572
dc.description.abstract Neural networks often excel when their inputs closely match the data on which they were trained, yet they frequently fail when inputs differ even slightly from their training data. This issue, known as distribution shift, remains a significant challenge when deploying machine learning models in practical applications such as medical imaging and autonomous driving. Traditional methods to address distribution shift typically involve additional training or data collection, which may not always be feasible for models already deployed. This thesis explores alternative strategies aimed at enhancing the robustness of already trained models to distribution shifts. The first part of this work introduces a benchmark specifically designed to evaluate test-time adaptation (TTA) methods under prolonged and varied distribution shifts. Using this benchmark, we demonstrate that while existing TTA techniques initially improve performance, they often lead to performance degradation with extended adaptation. We also propose a simple baseline method capable of consistently outperforming other tested methods, maintaining high performance even throughout prolonged adaptation. Building on these insights, the second part analyzes the underlying mechanisms of entropy-based loss functions commonly employed in TTA. We show that entropy minimization initially clusters embeddings of similar images together, thus increasing accuracy. However, continued entropy minimization eventually drives input image embeddings further away from training embeddings, thereby reducing accuracy. Leveraging this insight, we propose Weighted Flips (WF), a novel method capable of predicting model accuracy on arbitrary image sets without the need for labeled data. The final part of this work extends the principles of TTA to language models (LMs), focusing on the task of literature recommendation. We propose a benchmark that evaluates LMs in their ability to infer academic papers when given a short description that references them. Our benchmark demonstrates that LMs are unable to effectively perform this task. Therefore, we propose a simple agent that allows LMs to search for and read relevant papers, significantly improving their performance. en
dc.language.iso en de_DE
dc.publisher Universität Tübingen de_DE
dc.rights ubt-podno de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en en
dc.subject.other benchmarking en
dc.subject.other test time adapation en
dc.subject.other language models en
dc.subject.other computer vision en
dc.title Towards Robust Machine Learning: Benchmarking and Adaptation in Challenging Settings en
dc.type PhDThesis de_DE
dcterms.dateAccepted 2025-07-25
utue.publikation.fachbereich Informatik de_DE
utue.publikation.fakultaet 7 Mathematisch-Naturwissenschaftliche Fakultät de_DE
utue.publikation.noppn yes de_DE

Dateien:

Das Dokument erscheint in:

Zur Kurzanzeige