Cloud Computing in Bioinformatics: Benchmarking, Virtual Cluster, Security

DSpace Repository


Dateien:

URI: http://hdl.handle.net/10900/157322
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1573222
http://dx.doi.org/10.15496/publikation-98654
Dokumentart: PhDThesis
Date: 2024-09-10
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Informatik
Advisor: Kohlbacher, Oliver (Prof. Dr.)
Day of Oral Examination: 2024-08-14
DDC Classifikation: 004 - Data processing and computer science
500 - Natural sciences and mathematics
Other Keywords: Bioinformatik
Virtuelle Cluster
Benchmarking
Bioinformatics
Cloud
Virtual Cluster
License: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Show full item record

Abstract:

Questions in life sciences can be answered using bioinformatics methods. Most of the used approaches require advanced computing resources due to the complexity or the amount of data to be analyzed. Nowadays, access to powerful computing resources is easier than ever because of the widespread use of cloud computing. Especially in the field of life sciences, the interest in this technology is increasing. For example, the use of modern sequencing equipment generates data in the tera- and petabyte range that can not be analyzed with standard desktop computers anymore. The purchase of high-performance compute hardware is expensive in many aspects. In addition to the acquisition costs, there are also costs for maintenance and operation, which can also include personnel costs. In contrast, using a cloud solution can be a cheaper alternative. From a cloud user's perspective, cloud resources may look endlessly, but this is not the case. Since not all cloud users are familiar with the modern paradigm of cloud computing yet, difficulties can arise in efficient resource management and data security. Especially sensitive data as in the field of personalized medicine, require special protection, which is difficult to achieve without specific knowledge. Besides data security, resource utilization is an important issue. As the efficiency of an application has an impact on the resource usage it is advisable to evaluate an application regarding their resource sweet spot to avoid wasting resources. The evaluation can be done with the help of benchmarking tools. However, most tools in this area are not tailored to bioinformatics applications or are not very user-friendly. The same applies to the use of cloud resources as virtual compute clusters. The solutions available vary depending on the cloud platform and usually require expert knowledge. This thesis presentation presents new tools and concepts that enable cloud users to use cloud resources efficiently and without special prior knowledge. The presentation covers the implemented benchmark suite BOOTABLE, specifically tailored to bioinformatics and the tool VALET to create and scale virtual clusters in an OpenStack cloud environment in an automated way. Both tools were used to evaluate the scalability of bioinformatic applications as well as the saving potential of virtual cluster resources using a load-based scaling approach. Likewise, a general security concept for a secured analysis of sensitive data is presented. Furthermore, experiences regarding a certification process in the field of IT security are presented with a focus on the selection of the appropriate standard as well as the implementation in an academic environment with a small number of employees.

This item appears in the following Collection(s)