Resource-Efficient Processing of Large Data Volumes

Dissertation

Resource-Efficient Processing of Large Data Volumes

Title

Resource-Efficient Processing of Large Data Volumes

Author

Stefan Noll

Access

via Eldorado

Abstract

Nowadays, nearly every enterprise, organization, or institution processes large volumes of digital information to create value from data.To process data efficiently, they employ database management systems that promise to deliver high throughput, low latency, predictable performance, and scalability. However, meeting all four requirements at the same time poses major challenges for systems. To make matters worse, systems are expected to meet all four requirements while utilizing resources efficiently. However, the complex system environment of data processing applications, i.e., the combination of complex software architectures, workloads, and hardware setups makes it very challenging to achieve high resource efficiency. In this thesis, we address resource-efficient data processing by focusing on three scenarios that are relevant—but not limited—to database management systems.Our goal is to develop solutions that improve resource efficiency at multiple system levels.

First, we address the challenge of understanding complex systems.In particular, we focus on memory tracing, which allows us to analyze memory access characteristics, and thus enables us to identify problems of inefficient memory usage. The problem is, however, that available tools for memory tracing suffer from a large runtime overhead, which makes memory tracing very expensive in practice. To address the problem, we develop an efficient implementation of memory tracing using hardware-based sampling. We demonstrate that our approach enables us to analyze the runtime characteristics of complex systems with a low runtime overhead. It reveals, e.g., access patterns, access statistics, and data and query skew for individual data structures—at byte level. Consequently, our approach opens up new possibilities for optimizing resource usage, especially memory and cache usage.Second, we demonstrate how we can leverage information about memory access characteristics to optimize the cache usage of algorithms at hardware level. In particular, we address the problem of cache pollution within a multi core processor. Cache pollution can hurt performance, especially in concurrent workloads. To address cache pollution, we apply hardware-based cache partitioning. We derive a cache partitioning scheme that we deliberately keep simple: We restrict memory-intensive operators that do not reuse data, such as column scans, to a small portion of the last-level cache. Furthermore, we demonstrate how to integrate cache partitioning into the execution engine o fan existing database system with low engineering costs. In our evaluation we show that our approach effectively avoids cache pollution:It may improve but never degrades system performance for arbitrary workloads containing scan-intensive, cache-polluting operators.

Third, after optimizing resource usage within a multi core processor, we optimize resource usage across multiple computer systems. In particular, we address the problem of resource contention for a typical application: bulk loading, i.e., ingesting large volumes of data into the system. When bulk loading runs in parallel to query processing, both operations compete for processor cores, network bandwidth, and I/O bandwidth, which causes poor and unpredictable performance. Our analysis shows that resource contention occurs due to expensive data transformations during bulk loading. To address the problem of resource contention, we exploit the given hardware setup: a distributed environment consisting of the database server and one or more client machines holding the input data. We develop a distributed bulk loading mechanism, Shared Loading, which enables dynamically offloading parts of the bulk loading pipeline, i.e., deserialization and data transformation,to the client. In our evaluation we demonstrate that Shared Loading utilizes the available network bandwidth and the combined compute power of client and server more efficiently. It increases bulk loading throughput, improves a query workload’s tail latency, and also works well with additional compression methods.

Ultimately, we claim that the contributions of this thesis have an impact on real systems: We implement memory tracing for the Linux kernel, we integrate our cache partitioning mechanism into (a prototype version of) a commercial database system, and we design Shared Loading based on the architecture of a commercial system. In our evaluations, we demonstrate that our approaches improve a database system’s resource efficiency.