Zum Inhalt
Fakultät für Informatik

Shared Load(ing): Efficient Bulk Loading into Optimized Storage

Title

Shared Load(ing): Efficient Bulk Loading into Optimized Storage

Authors

Stefan Noll, Jens Teubner, Norman May, and Alexander Böhm

Published

Proc. of the 10th Annual Conference on Innovative Data Systems Research (CIDR), Amsterdam, January 2020.

Download

PDF

Abstract

Bulk loading into the optimized storage of a database system is a performance-critical task for data analysis, replication, and system integration. Depending on the storage layout, it may entail complex data transformations, making it also an expensive task that can disturb other workloads running in parallel.

In this work, we demonstrate that for a commercial, in-memory columnar system with compression-optimized storage, data transformation dominates the cost of bulk loading. The transformations may cause resource contention on a stressed system, resulting in poor and unpredictable performance for both bulk loading and query processing. To mitigate this problem, we propose Shared Loading, a distributed bulk loading mechanism that enables dynamically offloading deserialization and data transformation to the machine where the input data resides. In our evaluation we demonstrate that, for different network bandwidths and data sets, Shared Loading accelerates bulk loading into compression-optimized storage and improves the performance and predictability of queries running concurrently.

Project

Real-Time Analysis and Storage for High-Volume Data in Particle Physics (SFB 876, C5)

MxKernel: A Bare-Metal Runtime System for Database Operations on Heterogeneous Many-Core Hardware (DFG TE111/2-1)

Publication Log

December 2019

camera-ready for CIDR 2020

August 2019

submission to CIDR 2020 (accepted)