Towards Robust Machine Learning: Benchmarking and Adaptation in Challenging Settings

Press, Ori

Publikationsdienste
→
TOBIAS-lib - Publikationen und Dissertationen
→
7 Mathematisch-Naturwissenschaftliche Fakultät
→
Dokumentanzeige

dc.contributor.advisor	Bethge, Matthias (Prof. Dr.)
dc.contributor.author	Press, Ori
dc.date.accessioned	2025-09-15T09:12:29Z
dc.date.available	2025-09-15T09:12:29Z
dc.date.issued	2025-09-15
dc.identifier.uri	http://hdl.handle.net/10900/170245
dc.identifier.uri	http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1702459	de_DE
dc.identifier.uri	http://dx.doi.org/10.15496/publikation-111572
dc.description.abstract	Neural networks often excel when their inputs closely match the data on which they were trained, yet they frequently fail when inputs differ even slightly from their training data. This issue, known as distribution shift, remains a significant challenge when deploying machine learning models in practical applications such as medical imaging and autonomous driving. Traditional methods to address distribution shift typically involve additional training or data collection, which may not always be feasible for models already deployed. This thesis explores alternative strategies aimed at enhancing the robustness of already trained models to distribution shifts. The first part of this work introduces a benchmark specifically designed to evaluate test-time adaptation (TTA) methods under prolonged and varied distribution shifts. Using this benchmark, we demonstrate that while existing TTA techniques initially improve performance, they often lead to performance degradation with extended adaptation. We also propose a simple baseline method capable of consistently outperforming other tested methods, maintaining high performance even throughout prolonged adaptation. Building on these insights, the second part analyzes the underlying mechanisms of entropy-based loss functions commonly employed in TTA. We show that entropy minimization initially clusters embeddings of similar images together, thus increasing accuracy. However, continued entropy minimization eventually drives input image embeddings further away from training embeddings, thereby reducing accuracy. Leveraging this insight, we propose Weighted Flips (WF), a novel method capable of predicting model accuracy on arbitrary image sets without the need for labeled data. The final part of this work extends the principles of TTA to language models (LMs), focusing on the task of literature recommendation. We propose a benchmark that evaluates LMs in their ability to infer academic papers when given a short description that references them. Our benchmark demonstrates that LMs are unable to effectively perform this task. Therefore, we propose a simple agent that allows LMs to search for and read relevant papers, significantly improving their performance.	en
dc.language.iso	en	de_DE
dc.publisher	Universität Tübingen	de_DE
dc.rights	ubt-podno	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en	en
dc.subject.other	benchmarking	en
dc.subject.other	test time adapation	en
dc.subject.other	language models	en
dc.subject.other	computer vision	en
dc.title	Towards Robust Machine Learning: Benchmarking and Adaptation in Challenging Settings	en
dc.type	PhDThesis	de_DE
dcterms.dateAccepted	2025-07-25
utue.publikation.fachbereich	Informatik	de_DE
utue.publikation.fakultaet	7 Mathematisch-Naturwissenschaftliche Fakultät	de_DE
utue.publikation.noppn	yes	de_DE

Dateien:	Ori_Press_PhD_Thesis.pdf 20.3 MB PDF Beschreibung: PhD Thesis

Das Dokument erscheint in:

7 Mathematisch-Naturwissenschaftliche Fakultät [5054]

Zur Kurzanzeige

Veröffentlichen

Stöbern

Gesamter Bestand
Diese Sammlung

Mein Benutzerkonto

Einloggen

Towards Robust Machine Learning: Benchmarking and Adaptation in Challenging Settings

DSpace Repositorium (Manakin basiert)

Das Dokument erscheint in:

Stöbern

Gesamter Bestand

Diese Sammlung

Mein Benutzerkonto