Jump label

Service navigation

Main navigation

You are here:

Main content

Data Processing on FPGAs — VLDB 2009 Reviews

Reviews for paper Data Processing on FPGAs, submitted to VLDB 2009.

Overall rating: accept

Reviewer 1

Overall Rating

Accept

Reject due to technical incorrectness

No

Novelty

Medium

Technical Depth

Medium

Presentation

Very good

Summary of the paper's main contributions and impact (up to one paragraph)

The paper shows how to offload computation of a sliding-window median operator to FPGA hardware. The implementation uses a sorting network on the FPGA, resulting in a good ratio of performance vs. resource usage. As a by-product, the paper gives a good overview of the design choices and architecture of FPGA-based systems, which may be of more interest to the VLDB audience than the actual median implementation.

Three strong points of the paper (please number them S1,S2,S3)

S1 well-written, excellent presentation (figures etc)
S2 detailed description of involved issues and decisions
S3 experimental evaluation covers many aspects

Three weak points of the paper (please number them W1,W2,W3)

W1 the scope is rather limited (median operator), and it is not discussed how well the results generalize

Detailed Comments (please number each point)

D1 An enyoable read.
D2 The paper does not hint at any potential for extending the approach to other operators, or even to other data types or window sizes. What other kinds of data processing than sorting networks are there that do not require much control flow and have simple data flow patterns?

Reviewer 2

Overall Rating

Accept

Reject due to technical incorrectness

No

Novelty

Medium

Technical Depth

Medium

Presentation

Very good

Summary of the paper's main contributions and impact (up to one paragraph)

The paper discusses the use of FPGAs in data processing. The authors posit that FPGAs are going to be available in future multicore architectures, so we should figure out how to use them to speed up data operations. FPGAs have advantages over ordinary CPUs in their low power consumption, high potential internal parallelism, and easy customization. To explore the use of FPGAs in the DB world, the authors implemented a sliding window-based median operator (using sorting) on FPGAs. The paper provides experimental evaluations of the median operation with streams of different sizes, and also compares performance with standard CPUs.

Three strong points of the paper (please number them S1,S2,S3)

S1. Provides lots of details about FPGAs
S2. Detailed discussion of the authors' FPGA implementation.

Three weak points of the paper (please number them W1,W2,W3)

W1. The comparison presented in Figure 13 does not seem fair. (See D1.)
W2. The use case is not very clearly presented in the paper.
W3. The authors need to discuss the cost effectiveness of designing such a system versus using an off-the-shelf CPU.

Detailed Comments (please number each point)

D1. The comparison presented in Figure 13 does not seem fair. The CPUs are likely to be busy with other OS chores (or were they offloaded to another CPU, as ordinarily happens on a many-processor node?), whereas the FPGA implementation did not run anything other than the hardwarebased median operation. So, it is misleading to compare the performance of the FPGA with a CPU, unless the CPU is dedicating its entire time to the operation. Also, if I understand this experiment correctly, it involves repeatedly sorting 8 32-bit tuples. My understanding is that for sorting such a small number of things, one should never use QuickSort or those other fancy algorithms, because the constant-time overheads predominate. Presumably that is why "even-odd", which I had never heard of before I read this paper, predominates. Given all the care put to designing the code running on the FPGA, why not have an equally carefully designed custom sort for the 8 integers in the window, rather than using algorithms designed to sort large sets of ints? Maybe the result would be even-odd, which does seem to beat the FPGA in general, which is what one would expect with a generic fast CPU given a good algorithm to sort 8 ints. Also, what was the OS being used in the CPUs, and their memory and bus configurations?
D2. The high-level picture of how the authors would recommend using FPGAs in the DB world is not clear to me. Does this mean each database system should be able to reconfigure its FPGAs at run time into hundreds of different possible functions, one for each kind of expensive query operation? While doing things in hardware has always been faster than doing them in software, designing FPGAs for this really complicates the overall system architecture. Or should we just add some static FPGAs for the really expensive operations?
D3. Would using the scheme be cost effective overall, given the design/configuration cost of an FPGA vs that of using an ordinary CPU? I hope that the answer is yes, but it's not completely obvious. I can imagine a DBMS vendor having trouble trying to figure out what to do with FPGAs, given that every customer could have different platforms with different FPGA capabilities. Also, I am not sure that the answer to this question would be "yes" in the case of asynchronous designs, even if we can get it to be "yes" for synchronous designs. It would be good for the authors to comment on this.
D4. Much of this paper is spent discussing the FPGA design choices (the complexity of which underscores point D3). I think that this is space well spent, so that the DB audience understands the complexity of what goes on. However, the DB audience would surely prefer a more abstract perspective, where the nasty little details of how the little things work are hidden. I am not recommending any changes in this area, just pointing out the likely preferences of the audience.
D5. For the power consumption comparison, I think it would be fairer to consider the power drawn by the FPGA when it is not being used. Presumably the main CPU is going to be busy almost all the time. For some applications, the FPGA would be busy all the time too, but in others, it might just be sitting around waiting for the right query to come along. How much power would it draw when idle? The question here is whether it would be better to leave the slot entirely empty, versus have an FPGA in it that is utilized X% of the time.
D6. Small complaint: for this application, it might have made more sense to have a sliding window of size 9 and taken the middle value as the median.
==================
Regarding "The comparison is fair to the extent that the FPGA does not need to run anything else in a real setting": it is unfair because your competition is not a CPU running a query engine, but rather a separate CPU sitting in its own slot on the machine and handling only whatever work is handed to it by the main CPU. I am not so sure we would not be better served by putting a general-purpose CPU in that slot, rather than a FPGA card. The authors' response rather alarmed me, as it sounds like they haven't thought of this alternative!!!

Reviewer 3

Overall Rating

Accept

Reject due to technical incorrectness

No

Novelty

Medium

Technical Depth

High

Presentation

Adequate

Summary of the paper's main contributions and impact (up to one paragraph)

The paper shows how a potential database operator (here a median of 8) can be implemented efficiently on an FPGA. Offloading processing to a special-purpose FPGA (which might be cheaper and lower-power) might be attractive in future architectures.

Three strong points of the paper (please number them S1,S2,S3)

S1. The architectural trade-offs and FPGA implementation challenges are well-explained and interesting.
S2. I agree with the authors that it may be time to re-examine the question of offloading some DB work into special purpose hardware (but see W1 below).
S3. The performance evaluation is impressive.

Three weak points of the paper (please number them W1,W2,W3)

W1. There is a (somewhat old) literature arguing that special purpose hardware will never beat commodity CPUs. The present paper should identify this literature, and argue that times have changed sufficiently for a reexamination of this question. For example, FPGAs are commodity devices too.
W2. The 8-way median operator is one relatively uninteresting operator from a database point of view. (a) While the authors say they choose 8 for ease of illustration, it appears that with a number greater than 8, more FPGA realestate is needed, and that the implementation might not scale. (b) How much can we generalize the lessons to other kinds of database operations?

Detailed Comments (please number each point)

D1. Do you expect that these FPGAs could be reprogrammed on the fly in a running system to implement different operators as needed?

List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 6) Use this space to respond to author feedback too.

Please address W2.

Related Information



Sub content

Contact

Prof. Dr. Jens Teubner
Tel.: 0231 755-6481