Jump label

Service navigation

Main navigation

You are here:

Main content

Data Processing on FPGAs — SIGMOD 2009 Author Feedback

Feedback to reviews for paper submission “Hardware-Accelerated Stream Processing Using FPGAs” (earlier version of Data Processing on FPGAs) to SIGMOD 2009.


We would like to thank the reviewers for their comments and careful evaluation of the paper.

Relevance to SIGMOD: We are glad to see that reviewers 1 and 2 see the paper as interesting to the database community. We also understand the concerns of reviewer 3. We have put a lot of effort in careful writing to suit a database audience. The paper is about the design considerations involved when using FPGAs to process data streams, with the idea that such FPGAs will soon be available as extra cores in heterogeneous many-core architectures. The technical details are important to allow others to reproduce the results and understand the potential of the technology.

Performance gains and experimental setup: Reviewers 1 and 3 indicate the experiments do not demonstrate any spectacular performance gain. We are not aiming at presenting a new algorithm but at exploring the design space. The operator described in the paper was a deliberate choice to find the right balance between a complex enough problem to study the use of FPGAs and keeping the explanations understandable by a database audience (see above and critique of reviewer 3). We could have improved performance arbitrarily by choosing, e.g., an operator where hundreds of logical operations can be done in a single clock cycle using parallel hardware. Similarly, reviewer 3 rightly points out that there are other FPGA models available. As with the algorithm, we wanted to emphasize the usefulness of the general idea and explore its potential under reasonable constraints. Using better FPGAs only improves our results and emphasizes the points we make in the paper (and embeddable FPGAs are not likely to be as powerful as stand alone models) .

Answer to the explicit questions by Reviewer 1:

D1. We agree with your observation and, indeed, the aspect might deserve some explicit treatment in the paper. However, please bear in mind that even though your suggestion requires less operations (in average less than a full pass over the array is required to locate the expired value and insert the new one) it adds additional control-logic to the implementation. We have experimented with your suggestion but found it to be roughly 50% slower than the numbers reported in Section 6.3 (on the given eight-bit window). Our sorting network is particularly easy to schedule on modern out-of-order CPUs, which more than compensates for the redundant work for sorting.

D2. No, it is not essential to our results. Our circuit can be trivially modified to choose a real tuple as median: by removing both the adder and bit shifter, the lower and upper median can be directly returned using lines v4' or v5' (Figure 3).

Related Information

Sub content


Prof. Dr. Jens Teubner
Tel.: 0231 755-6481