Abstract:
This research provides a comprehensive analysis of multi-view scene interpretation, leveraging deep learning models to enhance input image quality. We delve into tasks ranging from low-level view interpolation to high-level 3D reconstruction and burst image denoising. Our approach leverages deep learning techniques and innovative methodologies to overcome limitations in existing classical and learning methods. We introduce a novel view interpolation technique that generates intermediate frames accurately without requiring additional geometric input. This method lays the foundation for our subsequent work on multi-view 3D reconstruction. To address the lack of ground truth depth information in 3D reconstruction, we propose a meta-learning and unsupervised approach to tackle the classic problem of multi-view stereo. We also tackle the issue of low-resolution depth maps by introducing a depth enhancing transformer-CNN hybrid module. Finally, we explore burst image denoising, proposing a model that utilizes multiple image alignment and feature volume merging to achieve state-of-the-art performance. Our research contributes significantly to the field of computer vision and has potential applications in various domains.