Efficient Stream Processing of Scientific Data
Title
Efficient Stream Processing of Scientific Data
Authors
Thomas Lindemann, Jonas Kauke, and Jens Teubner
Published
Joint Workshop of HardBD and Active, collocated with ICDE 2018
Download
via DOI (10.1109/ICDEW.2018.00029)
Abstract
Modern particle physics produces volumes of experimental data that challenge any data processing system. To illustrate, the trigger system of the LHCb experiment at CERN must sustain a data rate of 4 TB/s, yet maintain real-time characteristics. In this work, we report on ELPACO, a distributed event processing platform for scientific data. Its key characteristics are excellent scalability and high resource efficiency. ELPACO inherits its favorable scalability from Apache Storm, which we used as a basis for our platform. For resource efficiency, we tailored ELPACO to Eriador, a parallel, ARM-based hardware substrate with excellent energy/performance characteristics. With experiments on realistic data, we confirm a linear scalability (throughput vs. core count) and a 2.5x improvement in energy efficiency compared to existing solutions.
Project
Real-Time Analysis and Storage for High-Volume Data in Particle Physics (SFB 876, C5)