Jump label

Service navigation

Main navigation

You are here:

Main content

Data Processing on FPGAs — VLDB 2009 Author Feedback

Feedback to reviews for paper submission Data Processing on FPGAs to VLDB 2009.


We would like to thank the reviewers for their comments. We understand their concerns, which are very similar and mostly related to the same point: the simple operator explored and the lack of generality.


Reviewer 3 and Reviewer 1 ask about the 8-way median operator. It is correct that a larger operator will require more real estate. When real estate becomes an issue, the solution is to use the on-chip memory and the embedded CPU cores, a straightforward extension to the ideas presented in the paper. However, it does require to explain additional technical aspects of the FPGA regarding the architecture, interconnects, and configuration. It was not easy to balance the technical details that are necessary to provide a complete, repeatable, and correct description of the problem and the understandable preference for a "more abstract perspective where the nasty little details of how the little things work are hidden" (quote from review 2). The operator discussed is complex enough to allow us to address all the key aspects of the design space: parallelism, asynchronous design, mapping operators to gates, etc. which is the objective of the paper. The operator is a running example to make the paper easy to follow to somebody not familiar with FPGAs. We do not see the operator as part of the contribution.


Reviewer 1 (D2) and Reviewer 3 (W2) = The reviewers are correct in saying the paper does not discuss the generalization of the ideas. We will add this to the paper to make it clear how to implement other operators. The reason not to include such explanations was that we thought the generalization was an easy step for operators such as selection, projection, arithmetic or boolean expressions, and many forms of aggregation.

Detailed comments:

D1 = The comparison is fair to the extent that the FPGA does not need to run anything else in a real setting. A CPU will need to run what we run during the experiment plus a data processing engine. It would not make much sense to compare with the raw power of a CPU programmed in assembly. For a fair comparison with the CPU, we implemented 8 different sorting algorithms. We limit the presentation (Fig. 13) to the best performing and the most-common algorithms. If the reviewer feels it is important, we can include a detailed discussion on this aspect.

D2 (Reviewer 3 D1) = Commercially (see the related work) FPGAs are used in semi-static configurations to off-load expensive operations. We aim at configuring them dynamically. Synthesizing a complete execution plan on the fly is likely to be expensive but having a library of operators already synthesized that are dynamically deployed is perfectly doable using the dynamic partial reconfiguring mechanism of modern FPGAs. This is part of future work.

D3 = The design is cost effective but, as the paper shows, not for all uses. To deal with heterogeneity, there are tools in the market that translate, e.g., C into VHDL and then into FPGA circuits. These tools hide the heterogeneity in the designs in the same way regular compilers hide the differences among CPUs.

D5 = An unused FPGA located in the socket of a core or on a PCIe card can easily be powered down. There is also a technique that is widely known in FPGAs, "clock gating", which eliminates the switching power consumption of unused circuits elements by turning off the clock signals. With disabled clocks, the power consumed by the FPGA is solely due to leakage current. Except for the basic bus interface needed to communicate to the host system, the FPGA can be decoupled from the system when not in use. We will add this information to the paper.

Related Information

Sub content


Prof. Dr. Jens Teubner
Tel.: 0231 755-6481