Optimizing and Improving Deep Learning Methods for Stereo Matching

DSpace Repositorium (Manakin basiert)

Zur Kurzanzeige

dc.contributor.advisor Zell, Andreas (Prof. Dr.)
dc.contributor.author Rahim, Rafia
dc.date.accessioned 2025-07-09T13:47:50Z
dc.date.available 2025-07-09T13:47:50Z
dc.date.issued 2025-07-09
dc.identifier.uri http://hdl.handle.net/10900/167750
dc.identifier.uri http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1677507 de_DE
dc.identifier.uri http://dx.doi.org/10.15496/publikation-109077
dc.description.abstract In this thesis, our primary objective is to reduce the computational footprint of state-of-the-art stereo deep neural network (DNN) methods while maintaining their performance. Classical stereo methods used in computer vision applications, such as robotics and autonomous driving, often require complex tuning and struggle to perform well in real-world scenarios. On the other hand, recent end-to-end DNN methods have shown superior performance but come with high computational requirements, making them unsuitable for real-time applications. To achieve our objective, we pursue two complementary paths. Firstly, we optimize the individual components of state-of-the-art deep neural networks through a detailed empirical evaluation. This evaluation helps us identify the bottlenecks present in state-of-the-art stereo methods. Our findings reveal that the computational load primarily stems from the use of three-dimensional (3D) convolutions in performance-oriented end-to-end stereo methods. Taking inspiration from the success of MobileNet blocks used for two-dimensional (2D) convolutions, we propose a set of separable convolutions in the 3D space. We thoroughly investigate the impact of making convolutions separable in different dimensions and demonstrate significant reductions in computational load without sacrificing performance. In fact, we observe performance improvements. Building on these conclusions, we design a family of networks based on 2D and 3D separable convolutions. Furthermore, we explore the design of a leaner backbone for real-time stereo networks. We introduce a two-branch-based architecture that explicitly captures pixel-level and semantic-level information from the input images. This design choice results in a lean backbone that reduces computational load, albeit with a slight performance loss. To recover the lost performance, we propose to use learned attention weights based on cost volume combined with LogL1 loss for stereo matching. In addition to optimizing individual components and modules, we investigate the application of knowledge distillation for designing leaner and faster stereo networks. Leveraging insights from stereo methods and general knowledge distillation techniques, we introduce a novel knowledge distillation pipeline. Through a systematic study of various design choices, we develop a leaner and faster stereo network with competitive performance. We emphasize the importance of carefully selecting distillation points and loss functions in distilling stereo networks, as they have a significant impact on performance. The trained student networks not only rival performance-oriented methods but also gives comparable results to speed-oriented stereo methods. Overall, our thesis contributes to the development of computationally efficient and high-performing stereo vision systems. By addressing the computational challenges of state-of-the-art stereo methods and leveraging knowledge distillation techniques, we facilitate the adoption of these methods for real-world systems and applications. We firmly believe that the findings and methodologies presented in this thesis advance the field of stereo vision and pave the way for more practical and effective depth estimation solutions. en
dc.language.iso en de_DE
dc.publisher Universität Tübingen de_DE
dc.rights ubt-podno de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en en
dc.subject.classification Maschinelles Sehen , Deep Learning , Maschinelles Lernen de_DE
dc.subject.ddc 004 de_DE
dc.subject.ddc 500 de_DE
dc.subject.other Knowledge Distillation en
dc.subject.other Stereo Matching en
dc.subject.other Depth Estimation en
dc.subject.other Optimization en
dc.subject.other Deep Neural Network en
dc.subject.other Stereo Networks en
dc.title Optimizing and Improving Deep Learning Methods for Stereo Matching en
dc.type PhDThesis de_DE
dcterms.dateAccepted 2025-05-07
utue.publikation.fachbereich Informatik de_DE
utue.publikation.fakultaet 7 Mathematisch-Naturwissenschaftliche Fakultät de_DE
utue.publikation.noppn yes de_DE

Dateien:

Das Dokument erscheint in:

Zur Kurzanzeige