Flexible Analysis of Plant Genomes in a Database Management System
Sebastian Dorok, Sebastian Breß, Jens Teubner, and Gunter Saake
Proceedings of the 18th Int'l Conference on Extending Database Technology (EDBT 2015), Brussels, Belgium, March 2015.
Analysis of genomes has a wide range of applications from disease susceptibility studies to plant breeding research. For example, different types of barley have differing characteristics regarding draught or salt tolerance. Thus, a typical use case is comparing two plant genomes and try to deduce which genes are responsible for a certain resistance. For this, we need to find differences in large volumes of aligned genome data, which is already available in large genome databases.
The challenge is to efficiently retrieve the genotypes of a certain range of the genome, and then to determine variants and their impact on the plant organism. State-of-the-art tools are fixed pipelines with a fixed parametrization. However, in practice, users want to interactively analyse genome data and need to customize the parametrization.
In this demonstration, we show how we can support flexible ad-hoc analyses of arbitrary plant genomes using SQL with a small set of user-defined aggregation functions and dynamic parametrization. Furthermore, we demonstrate how genome analysis workflows for variant calling can be applied to our system and provide insights about the performance of our system.