Backpropagation Beyond the Gradient

Dangel, Felix Julius

Publikationsdienste
→
TOBIAS-lib - Publikationen und Dissertationen
→
7 Mathematisch-Naturwissenschaftliche Fakultät
→
Dokumentanzeige

dc.contributor.advisor	Hennig, Philipp (Prof. Dr.)
dc.contributor.author	Dangel, Felix Julius
dc.date.accessioned	2023-02-06T12:01:13Z
dc.date.available	2023-02-06T12:01:13Z
dc.date.issued	2023-02-06
dc.identifier.uri	http://hdl.handle.net/10900/136114
dc.identifier.uri	http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1361147	de_DE
dc.identifier.uri	http://dx.doi.org/10.15496/publikation-77465
dc.identifier.uri	http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1361147	de_DE
dc.identifier.uri	http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1361144	de_DE
dc.description.abstract	Automatic differentiation is a key enabler of deep learning: previously, practitioners were limited to models for which they could manually compute derivatives. Now, they can create sophisticated models with almost no restrictions and train them using first-order, i. e. gradient, information. Popular libraries like PyTorch and TensorFlow compute this gradient efficiently, automatically, and conveniently with a single line of code. Under the hood, reverse-mode automatic differentiation, or gradient backpropagation, powers the gradient computation in these libraries. Their entire design centers around gradient backpropagation. These frameworks are specialized around one specific task—computing the average gradient in a mini-batch. This specialization often complicates the extraction of other information like higher-order statistical moments of the gradient, or higher-order derivatives like the Hessian. It limits practitioners and researchers to methods that rely on the gradient. Arguably, this hampers the field from exploring the potential of higher-order information and there is evidence that focusing solely on the gradient has not lead to significant recent advances in deep learning optimization. To advance algorithmic research and inspire novel ideas, information beyond the batch-averaged gradient must be made available at the same level of computational efficiency, automation, and convenience. This thesis presents approaches to simplify experimentation with rich information beyond the gradient by making it more readily accessible. We present an implementation of these ideas as an extension to the backpropagation procedure in PyTorch. Using this newly accessible information, we demonstrate possible use cases by (i) showing how it can inform our understanding of neural network training by building a diagnostic tool, and (ii) enabling novel methods to efficiently compute and approximate curvature information. First, we extend gradient backpropagation for sequential feedforward models to Hessian backpropagation which enables computing approximate per-layer curvature. This perspective unifies recently proposed block- diagonal curvature approximations. Like gradient backpropagation, the computation of these second-order derivatives is modular, and therefore simple to automate and extend to new operations. Based on the insight that rich information beyond the gradient can be computed efficiently and at the same time, we extend the backpropagation in PyTorch with the BackPACK library. It provides efficient and convenient access to statistical moments of the gradient and approximate curvature information, often at a small overhead compared to computing just the gradient. Next, we showcase the utility of such information to better understand neural network training. We build the Cockpit library that visualizes what is happening inside the model during training through various instruments that rely on BackPACK’s statistics. We show how Cockpit provides a meaningful statistical summary report to the deep learning engineer to identify bugs in their machine learning pipeline, guide hyperparameter tuning, and study deep learning phenomena. Finally, we use BackPACK’s extended automatic differentiation functionality to develop ViViT, an approach to efficiently compute curvature information, in particular curvature noise. It uses the low-rank structure of the generalized Gauss-Newton approximation to the Hessian and addresses shortcomings in existing curvature approximations. Through monitoring curvature noise, we demonstrate how ViViT’s information helps in understanding challenges to make second-order optimization methods work in practice. This work develops new tools to experiment more easily with higher-order information in complex deep learning models. These tools have impacted works on Bayesian applications with Laplace approximations, out-of-distribution generalization, differential privacy, and the design of automatic differentia- tion systems. They constitute one important step towards developing and establishing more efficient deep learning algorithms.	en
dc.language.iso	en	de_DE
dc.publisher	Universität Tübingen	de_DE
dc.rights	ubt-podok	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=de	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=en	en
dc.subject.classification	Automatische Differentiation , Neuronales Netz , Optimierung , Krümmung , Deep learning , Maschinelles Lernen , Gradient , Newton-Verfahren	de_DE
dc.subject.ddc	004	de_DE
dc.subject.other	Backpropagation	en
dc.subject.other	curvature matrices	en
dc.subject.other	higher-order information	en
dc.subject.other	second-order optimization	en
dc.title	Backpropagation Beyond the Gradient	en
dc.type	PhDThesis	de_DE
dcterms.dateAccepted	2023-01-31
utue.publikation.fachbereich	Informatik	de_DE
utue.publikation.fakultaet	7 Mathematisch-Naturwissenschaftliche Fakultät	de_DE
utue.publikation.noppn	yes	de_DE

Dateien:	2023_02_01_thesis.pdf 9.06 MB PDF Beschreibung: PhD thesis (pdf ...

Das Dokument erscheint in:

7 Mathematisch-Naturwissenschaftliche Fakultät [5137]

Zur Kurzanzeige

Veröffentlichen

Stöbern

Gesamter Bestand
Diese Sammlung

Mein Benutzerkonto

Einloggen

Backpropagation Beyond the Gradient

DSpace Repositorium (Manakin basiert)

Das Dokument erscheint in:

Stöbern

Gesamter Bestand

Diese Sammlung

Mein Benutzerkonto