Parallelizing Data Processing on FPGAs with Shifter Lists
Louis Woods, Jens Teubner, and Gustavo Alonso
ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 8 no. 2, March 2015.
Parallelism is currently seen as a mechanism to minimize the impact of the power and heat dissipation problems encountered in modern hardware. Data parallelism—based on partitioning the data—and pipeline parallelism—based on partitioning the computation—are the two main approaches to leverage parallelism on a wide range of hardware platforms.
Unfortunately, not all data processing problems are susceptible to either of those strategies. An example is the skyline operator, which computes the set of Pareto-optimal points within a multi-dimensional data set. Existing approaches to parallelize the skyline operator are based on data parallelism. As a result, they suffer from a high overhead when merging intermediate results because of the lack of a global view of the problem inherent to partitioning the input data.
In this paper, we show how to combine pipeline with data parallelism on an FPGA for a more efficient utilization of the available hardware parallelism. As we show in our experiments, skyline computation using our proposed technique scales linearly with the number of processing elements and the performance we achieve on a rather small FPGA is comparable to the one of a 64-core high-end server running a state-of-the-art data parallel implementation of skyline.
The proposed approach to parallelize the skyline operator can be generalized to a wider range of data processing problems. We demonstrate this through a novel, highly parallel data structure, a shifter list, that can be efficiently implemented on an FPGA. The resulting template is easy to parameterize to implement a variety of computationally intensive operators such as frequent items, n-closest pairs, or K-means.