Enhancement and Evaluation of Deep Generative Networks with Applications in Super-Resolution and Image Generation

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/155869
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1558697
http://dx.doi.org/10.15496/publikation-97202
Dokumentart: Dissertation
Erscheinungsdatum: 2024-07-30
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Lensch, Hendrik P. A. (Prof. Dr.)
Tag der mündl. Prüfung: 2024-07-04
DDC-Klassifikation: 004 - Informatik
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

Since the advent of computers, perceiving the world visually has been a major focus of research. Today, raster graphics are the most popular format for storing arbitrary visual data. This choice of representation carries major advantages, primarily high flexibility and suitability for hardware acceleration. However, increasing the size of images leads to well- known blurring or pixelization artifacts. The field of super resolution (SR) investigates methods to improve the quality of enlarged visual data. This work is structured into two sections: making strides towards more realistic and efficient SR, and improving the architecture and evaluation of deep generative models that form the foundation on which today’s best SR methods are built. While prior art has primarily focused on improving network architechtures of deep neural networks, we propose a shift in focus to the loss functions used to train the models. We present EnhanceNet, a novel method that achieves state-of-the-art quantitative and qualitative image quality in SR through a novel set of training objectives. A combination of perceptual, style, and adversarial networks applied to the task of SR leads to previously unattainable visual fidelity at large scaling factors. Extending image SR methods to video data is regularly achieved through feeding a number of neighbouring frames into a neural network that is applied in a sliding window across time. The major shortcoming of this common approach lies in its low computational efficiency as each frame is processed independently several times, and the resulting temporal instabilities in the outputs. We propose Frame-Recurrent Video Super Resolution, a method that recurrently uses the output of the last frame to upsample the next one. The method achieves state-of-the-art video SR quality while vastly improving computational requirements and temporal consistency. GANs are a method as powerful, as difficult to train, regularly suffering from failures due to bad gradients. We propose Tempered Adversarial Networks, a novel way to auto-stabilize GAN training through the introduction of a lens module that modifies real data samples to look more similar to generated ones throughout training. A range of experiments shows the promise of such techniques in improving gradients for the generator and therefore improving success rates of training. Measuring the success of these methods is known to be a challenging task, as it implies matching distributions rather than pairs of samples. We define Precision and Recall for Distributions which disentangles a measure of quality of samples from coverage of the original data distribution. We close with Regularized AutoEncoders, a study into differences and similarities between Auto Encoders, VAEs and several forms inbetween. The major finding is that stochastic VAEs are not always required for the task they are set out to solve, and simpler RAEs often outperform their stochastic counterparts.

Das Dokument erscheint in: