Optimizing and Improving Deep Learning Methods for Stereo Matching

Rahim, Rafia

Publikationsdienste
→
TOBIAS-lib - Publikationen und Dissertationen
→
7 Mathematisch-Naturwissenschaftliche Fakultät
→
Dokumentanzeige

dc.contributor.advisor	Zell, Andreas (Prof. Dr.)
dc.contributor.author	Rahim, Rafia
dc.date.accessioned	2025-07-09T13:47:50Z
dc.date.available	2025-07-09T13:47:50Z
dc.date.issued	2025-07-09
dc.identifier.uri	http://hdl.handle.net/10900/167750
dc.identifier.uri	http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1677507	de_DE
dc.identifier.uri	http://dx.doi.org/10.15496/publikation-109077
dc.description.abstract	In this thesis, our primary objective is to reduce the computational footprint of state-of-the-art stereo deep neural network (DNN) methods while maintaining their performance. Classical stereo methods used in computer vision applications, such as robotics and autonomous driving, often require complex tuning and struggle to perform well in real-world scenarios. On the other hand, recent end-to-end DNN methods have shown superior performance but come with high computational requirements, making them unsuitable for real-time applications. To achieve our objective, we pursue two complementary paths. Firstly, we optimize the individual components of state-of-the-art deep neural networks through a detailed empirical evaluation. This evaluation helps us identify the bottlenecks present in state-of-the-art stereo methods. Our findings reveal that the computational load primarily stems from the use of three-dimensional (3D) convolutions in performance-oriented end-to-end stereo methods. Taking inspiration from the success of MobileNet blocks used for two-dimensional (2D) convolutions, we propose a set of separable convolutions in the 3D space. We thoroughly investigate the impact of making convolutions separable in different dimensions and demonstrate significant reductions in computational load without sacrificing performance. In fact, we observe performance improvements. Building on these conclusions, we design a family of networks based on 2D and 3D separable convolutions. Furthermore, we explore the design of a leaner backbone for real-time stereo networks. We introduce a two-branch-based architecture that explicitly captures pixel-level and semantic-level information from the input images. This design choice results in a lean backbone that reduces computational load, albeit with a slight performance loss. To recover the lost performance, we propose to use learned attention weights based on cost volume combined with LogL1 loss for stereo matching. In addition to optimizing individual components and modules, we investigate the application of knowledge distillation for designing leaner and faster stereo networks. Leveraging insights from stereo methods and general knowledge distillation techniques, we introduce a novel knowledge distillation pipeline. Through a systematic study of various design choices, we develop a leaner and faster stereo network with competitive performance. We emphasize the importance of carefully selecting distillation points and loss functions in distilling stereo networks, as they have a significant impact on performance. The trained student networks not only rival performance-oriented methods but also gives comparable results to speed-oriented stereo methods. Overall, our thesis contributes to the development of computationally efficient and high-performing stereo vision systems. By addressing the computational challenges of state-of-the-art stereo methods and leveraging knowledge distillation techniques, we facilitate the adoption of these methods for real-world systems and applications. We firmly believe that the findings and methodologies presented in this thesis advance the field of stereo vision and pave the way for more practical and effective depth estimation solutions.	en
dc.language.iso	en	de_DE
dc.publisher	Universität Tübingen	de_DE
dc.rights	ubt-podno	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en	en
dc.subject.classification	Maschinelles Sehen , Deep Learning , Maschinelles Lernen	de_DE
dc.subject.ddc	004	de_DE
dc.subject.ddc	500	de_DE
dc.subject.other	Knowledge Distillation	en
dc.subject.other	Stereo Matching	en
dc.subject.other	Depth Estimation	en
dc.subject.other	Optimization	en
dc.subject.other	Deep Neural Network	en
dc.subject.other	Stereo Networks	en
dc.title	Optimizing and Improving Deep Learning Methods for Stereo Matching	en
dc.type	PhDThesis	de_DE
dcterms.dateAccepted	2025-05-07
utue.publikation.fachbereich	Informatik	de_DE
utue.publikation.fakultaet	7 Mathematisch-Naturwissenschaftliche Fakultät	de_DE
utue.publikation.noppn	yes	de_DE

Dateien:	Optimzing_DNN_Stereo_Matching_Methods.pdf 96.7 MB PDF

Das Dokument erscheint in:

7 Mathematisch-Naturwissenschaftliche Fakultät [4988]

Zur Kurzanzeige

Veröffentlichen

Stöbern

Gesamter Bestand
Diese Sammlung

Mein Benutzerkonto

Einloggen

Optimizing and Improving Deep Learning Methods for Stereo Matching

DSpace Repositorium (Manakin basiert)

Das Dokument erscheint in:

Stöbern

Gesamter Bestand

Diese Sammlung

Mein Benutzerkonto