From Algorithmic to Neural Beamforming

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/125877
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1258772
http://dx.doi.org/10.15496/publikation-67240
Dokumentart: Dissertation
Erscheinungsdatum: 2022-04-01
Sprache: Deutsch
Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Schilling, Andreas (Prof. Dr.)
Tag der mündl. Prüfung: 2022-01-28
DDC-Klassifikation: 004 - Informatik
Freie Schlagwörter:
Neural Networks
Beamforming
LSTM
Microphone Arrays
Deep Neural Networks
Cross Correlation
Neural Beamforming
Digital Signal Processing
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=en
Gedruckte Kopie bestellen: Print-on-Demand
Zur Langanzeige

Abstract:

Human interaction increasingly relies on telecommunication as an addition to or replacement for immediate contact. The direct interaction with smart devices, beyond the use of classical input devices such as the keyboard, has become common practice. Remote participation in conferences, sporting events, or concerts is more common than ever, and with current global restrictions on in-person contact, this has become an inevitable part of many people's reality. The work presented here aims at improving these encounters by enhancing the auditory experience. Augmenting fidelity and intelligibility can increase the perceived quality and enjoyability of such actions and potentially raise acceptance for modern forms of remote experiences. Two approaches to automatic source localization and multichannel signal enhancement are investigated for applications ranging from small conferences to large arenas. Three first-order microphones of fixed relative position and orientation are used to create a compact, reactive tracking and beamforming algorithm, capable of producing pristine audio signals in small and mid-sized acoustic environments. With inaudible beam steering and a highly linear frequency response, this system aims at providing an alternative to manually operated shotgun microphones or sets of individual spot microphones, applicable in broadcast, live events, and teleconferencing or for human-computer interaction. The array design and choice of capsules are discussed, as well as the challenges of preventing coloration for moving signals. The developed algorithm, based on Energy-Based Source Localization, is discussed and the performance is analyzed. Objective results on synthesized audio, as well as on real recordings, are presented. Results of multiple listening tests are presented and real-time considerations are highlighted. Multiple microphones with unknown spatial distribution are combined to create a large-aperture array using an end-to-end Deep-Learning approach. This method combines state-of-the-art single-channel signal separation networks with adaptive, domain-specific channel alignment. The Neural Beamformer is capable of learning to extract detailed spatial relations of channels with respect to a learned signal type, such as speech, and to apply appropriate corrections in order to align the signals. This creates an adaptive beamformer for microphones spaced on the order of up to 100m. The developed modules are analyzed in detail and multiple configurations are considered for different use cases. Signal processing inside the Neural Network is interpreted and objective results are presented on simulated and semi-simulated datasets.

Das Dokument erscheint in: