Jump label

Service navigation

Main navigation

You are here:

Main content

Complex Event Detection at Wire Speed with FPGAs — VLDB 2010 Reviews

Reviews for paper Complex Event Detection at Wire Speed with FPGAs, submitted to VLDB 2010.

Overall rating: accept

Reviewer 1

Overall Recommendation

Accept

Reject due to technical incorrectness

No

Novelty

Medium

Technical Depth

Medium

Presentation

Adequate

Summary of the paper's main contributions and impact (up to one paragraph)

The authors pursue their exploration of FPGA based Hardware accelerators for efficient data management. After a compiler for stream processing (VLDB'09) and data mining (ICDE'10), they turn their focus on complex event detection. They expand existing solutions based on non deterministic finite automata for regular expression matching on FPGA to the problem of complex-event detection (overlapping predicates over tuple stream vs. constrained flow of characters). Throughput results are impressive, even with small network packets, with an actual throughput close to the theoretical maximum.

Three strong points of the paper (please number them S1,S2,S3)

(S1) The authors demonstrate that their approach is effective, over a reasonable input range.
(S2) The treatment of complex event is based on the upcoming SQL standard
(S3) While the contribution is incremental (extending existing work on NFA for regular expressions to complex events), the extension is non trivial, well motivated and clearly explained.

Three weak points of the paper (please number them W1,W2,W3)

(W1) This is incremental work both with respect to the authors previous work on FPGA based systems, and with respect to the work on regular expressions matching on FPGA.

Detailed Comments (please number each point)

(1) In 4.3 there is nothing intuitive about mapping a NFA state to a flip flop register for those of us who are not familiar with VHDL. I did not get much from appendix C in terms of building up that intuition. Basically, the whole design discussion in 4 was a bit lost on me. You could have used the appendix to make the paper really self contained and explain in a page how the nature of FPGA programming differs from CPU programming and how it opens up options and introduces constraints in terms of design.

Reviewer 2

Overall Recommendation

Accept

Reject due to technical incorrectness

No

Novelty

Medium

Technical Depth

Medium

Presentation

Very good

Summary of the paper's main contributions and impact (up to one paragraph)

This paper presents an FPGA design and implementation for recognizing regular expressions among records passing over a network. With existing technology they are able to process one sample regular expression for up to 800 separate partitions (substreams).

Three strong points of the paper (please number them S1,S2,S3)

S1. the paper represents some interesting engineering, with hardware based solutions that are very different from what one would do in software.
S2. The implementation is carried through to an actual FPGA connected to a network, and very good bandwidth is achieved.

Three weak points of the paper (please number them W1,W2,W3)

W1. The time-out solution seems ad-hoc, and likely to limit the applicability of the approach. The motivating example about marathons is clearly not in the one-second timeout range.
W2. Even with the high throughput, there will be a high latency due to the long pipeline. The paper does not measure this latency or discuss whether such latency can be tolerated by applications.
W3. I had trouble understanding the algorithm of figure 7. First, what is S? Second, despite what is said in the text, it seems that a free node will travel towards the end of the pipeline alongside an identifier even if a matching element appears further down the pipeline. That matching element will be swapped past the free element and never directly compared to x. What am I missing?

Detailed Comments (please number each point)

Other uses of FPGAs in DBs, such as by Netezza might be worth citing in the related work.

List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 6) Use this space to respond to author feedback too.

Please address W1, W2, W3.

Reviewer 3

Overall Recommendation

Reject

Reject due to technical incorrectness

No

Novelty

Low

Technical Depth

Low

Presentation

Very good

Summary of the paper's main contributions and impact (up to one paragraph)

The authors propose techniques for pushing complex event detection onto FPGAs, which would enable much faster event detection than possible using just software-based solutions. The authors discuss the challenges in this task, trade-offs between using NFAs vs DFAs, and present a comprehensive end-to-end solution. Their experimental results show that their techinque is far superior to the prior work.

Three strong points of the paper (please number them S1,S2,S3)

S1: I like the idea of using FPGAs to handle data stream tasks a lot. In many cases, clearly this may be the only way to handle the data rates.
S2: The authors make very nice observations about the tradeoffs between using NFAs and DFAs, and about the state explosion resulting from use of predicates.
S3: Several techniques they propose (e.g. pipelining to handle stream partitions) appear to be novel and interesting.

Three weak points of the paper (please number them W1,W2,W3)

W1: Overall the technical novelty and contributions are somewhat questionable, since the idea of using FPGAs has been proposed before, and it is hard to identify the specific new contributions made by the authors. In part, this is because the discussion of related work is very sparse. In particular, a quick read of Reference 16 (referred often) suggests that that paper has a lot of overlap with the submission, but there is very little discussion of the differences between the two. The authors need to do a better job of highlighting the key differences.
W2: Although overall the paper is quite well written, in some places the description is hard to follow. E.g. Sections 4.3 and 4.6 are somewhat terse and hard to follow.
W3: The solution for dealing with stream partitioning is ad hoc in places. E.g. the technique can only handle 800 partitions I think (I am somewhat hazy on the details there), which may be small in many target domains. Further the authors suddenly introduce "timeouts", by suggesting that the patterns should only be matched over a time window. That seems like a serious limitation of the approach, and deserves more discussion.

Detailed Comments (please number each point)

Overall the paper is very well-written and organized, and I have no additional comments to the authors.

List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 6) Use this space to respond to author feedback too.

Please address W1 above.

Related Information



Sub content

Contact

Prof. Dr. Jens Teubner
Tel.: 0231 755-6481