Abstract:
The field of metabolomics is concerned with analyzing data from high-throughput experiments.
Its objective is the identification, quantification, and elucidation of the function and interaction
of small molecules in a biological system. The prevalent methods used in metabolomics are
nuclear magnetic resonance spectroscopy and mass spectrometry. A typical mass spectrometry
metabolomics analysis workflow is composed of several steps. First, biological samples are
measured using liquid chromatography and mass spectrometry. Second, computational mass
spectrometry is used to analyze the acquired data. The results are then stored in a human-
readable format, statistically post-processed, and visualized. The driving force of the field
is the development of new methods on the analytical and computational side to reach the
above-mentioned aims. Nonetheless, there are still some major unsolved issues at different
stages of the analysis workflow.
Controlling the false-discovery rate (FDR) is well established in other fields (i.e., proteomics),
but so far, methods are lacking in the field of metabolomics. This seriously limits the confidence
in reported identifications and quantifications, and manual assessment is still common practice.
In recent years different methods have been established for untargeted approaches. However,
in terms of targeted strategies, the lack of robust FDR estimators prevented the field from
obtaining highly confident quantifications. Progress in automating the manual process is
substantial to advance targeted metabolomics research and allow proper high-throughput
analysis. We established an automated, FDR-controlled targeted analysis workflow that enables
a robust FDR estimation for the first time, thus improving the comparability of results in the
metabolomics field.
Another critical aspect of scientific research is representing and sharing analysis results based
on the FAIR principles. The FAIR principles stand for findable, accessible, interoperable, and
reusable. In 2014, the human-readable file format MzTab was introduced in the proteomics
and metabolomics fields to enable the distribution of analysis results in a standardized open
format. However, in recent years, the limitations of this format regarding metabolomics
data have become apparent. As part of the Proteomics Standard Initiative, we designed the
improved standard MzTab-M that focuses on interoperability and reusability and integrated it
into our OpenMS software framework.
Metabolomics has a massive range of applications and can be used to answer a variety of
scientific questions. The field attempts to answer individual data- and objective-related issues
by developing new problem-specific post-processing methods, as we show based on an example
in the area of food chemistry. In recent years, the production of primary cacao products, such as
cacao butter, moved from Europe to the cacao-producing countries. This leads to the challenge
of shifting the quality assessment from raw to primary products to uphold the quality standards
and control in the European market. To this end, we provided the basis for such a method
by using biomarker identification and machine learning. Using a regression method, we were
able to assess the shell quantity in a mixture of bean and shell and, with it, the quality of the
product.