Reviews for paper FPGA Acceleration for the Frequent Item Problem, submitted to ICDE 2010.

**Overall rating:** accept (as full paper)

Accept

Low

High

High

Easily readable

The authors make a convincing case that FPGAs have something to offer databases (as they did in their sigmod tutorial).

The paper is beautify written and would serve as a nice introduction to both FPGAs and the frequent item problem.

This paper will inspire others in databases/data mining to consider FPGAs

I find the gray hatchline in the figure ugly and distracting.

This is out of my area, so I am given low confidence. But it is the best of my 10 papers this year.

Accept

High

Medium

Medium

Easily readable

This paper provides FPGA implementations for the well-known frequent item algorithm Space-Saving and compare their performance focusing on throughput.

First implementation makes us of content-addressable memory to implement Min-heap data structure in B(lock)RAM. The content-addressable memory employed one-hot encoding to implement key-value store. This approach is not scalable as the number of bits required to represent the (item, bin) value pair is linearly increasing with the size of alphabet and number of bins. And the implementation of min-heap in FPGA requires complex arithmetic operation such as division (in computation of child/parent addresses). These factors makes the final implementation is worse than the sequential algorithm when the number of bins becomes larger (>128) (in terms of throughput). This implementation’s performance is also affected by input data distribution. This behavior is inherited from heap structure.

Another implementation makes use of the parallel capability to replace the use of min-heap. The idea is to implements minimum finding in a tree-like architecture and parallel the operation. This implementation achieve larger throughput when then number of bins (number of parallel computation) is less than 40. Its performance however still depends on the input data distribution.

The final implementation makes use of the algorithm array to pipeline the space-saving algorithm. The pipeline implementation makes the intermediate data passing local, thus greatly decrease the complexity of the hardware circuit. And the throughput is not affected by the input data's frequency distribution.

1. This paper's structure is well-designed. Three implementations explain the FPGA design concerns step by step. Users without FPGA background are also easy to understand the design concerns of FPGA algorithm.

2. The analysis for experimental results is convincing and consistent throughout the paper. E.g. Each implementation's experiments are analysis from performance characteristics, Chip resource requirements and scalability (3rd implementation only). Users can compare three implementations easily following the analysis.

3. The introduction of knowledge of FPGA design is very relevant and easy to be understood. This demonstrated authors' in-depth understanding of FPGA's structure and design concern and performance bottleneck.

1. The core idea of various FPGA implementations are not innovative enough. All the implementation details (use of content addressable memory) and specific data structure (such as Algorithm Array) have been proposed by related works.

2. The first implementation is naive. Easy modification will only use logk bits for the same size of alphabet and bins. Though I cannot assert that such modification will definitely improve performances.

3. The Performance diagram (figure 2, 5, 7, 11) are difficult to read due to inappropriate use of shading. Since these four diagrams are essential in showing various implementations' characteristic, unable to read information is a big deficit.

The contribution of this paper is to provide a step by step analysis and solutions towards FPGA counterpart of a widely used sequential frequent item algorithm. The affects of different parallel approaches are well presented in this paper. Readers can learn from this particular example of FPGA solution and extend towards other cases following the paper's analysis and suggestion.This is a practical guide for more complex algorithm design on FPGA.

Accept

High

Medium

Medium

Easily readable

This is an interesting paper that discusses how to effectively implement the frequent item problem on FPGAs.

1. Well written.

2. Interesting methodsn

3. Evaluation on synthetic data is solid

1. Experimental results on real data sources

2. Missing related work discussion

This papers presents a novel FPGA implementation for accelerating Frequent Item problem that asks for items which occur more than a user defined threshold number (aka support) times in a given stream. Authors build upon recently proposed efficient Space-Saving algorithm to present three different designs in which the algorithm is can be implemented on an FPGA. The fundamental benefits are realized by exploiting specific features in FPGA such as dual ported access to BRAM blocks, parallel lookups, and pipelining.

The paper does a good job in explaining their methods -- it is accessible even to readers who are not familiar with FPGAs.

The min-heap implementation in Section V resembles the "up-sweep" operation in famous parallel prefix sum algorithm by G. E. Blelloch. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/scandal/public/papers/CMU-CS-90-190.html. However, the implementation of this technique is interesting, especially because it makes the technique independent of z, the data distribution.

I would suggest to the author that as space permits a more thorough literature survey is warranted. Work on pipelined parallelism has been a very active domain in many sub-fields of parallel, high performance and distributed computing. See for example some of the works by Jaspal Subhlok et al (JPDC 2000). Also very related to the current work is the work of Choudhary and colleagues (several articles on using FPGAs to accelerate various DM workloads), and Buehrer et al (data mining on the Cell architecture).

If space permits one could also compare and contrast with other mainstream database and data mining algorithms on modern and emerging architectures (e.g. Kersten and colleagues, Ailamiki and colleagues, Parthasarathy and colleagues, Ross and colleagues etc.).

It would also be nice to see more detailed evaluation of proposed techniques -- comparative analysis etc on more realistic datasets.

Even though the proposed techniques are simple, I think the paper is interesting and it provides a case for the use of FPGAs for data analysis tasks -- I think this will be interesting for the ICDE audience.

- submission (PDF)
- final paper (PDF) — published at ICDE 2010