Sprungmarken

Servicenavigation

Hauptnavigation

Sie sind hier:

Hauptinhalt

How Soccer Players Would Do Stream Joins — VLDB 2010 Reviews

Reviews for paper How Soccer Players Would Do Stream Joins, submitted to VLDB 2010.

Overall rating: reject

Reviewer 1

Overall Recommendation

Reject

Reject due to technical incorrectness

No

Novelty

Low

Technical Depth

Medium

Presentation

Adequate

Summary of the paper's main contributions and impact (up to one paragraph)

The paper presents a mechanism for carrying out parallel joins, via a technique that they call "handshake join". The paper uses the analogy of soccer players walking in opposite direction to shake hands. In a nutshell, this technique stream tuples from two tables R and S being joined in opposite directions, and tuples in the same segment currently in one core are joined together. The paper evaluates the handshake join in a multicore machine and FPGAs.

Three strong points of the paper (please number them S1,S2,S3)

S1: A simple and straightforward design to do parallel joins in a multi-core environment.
S2: Evaluation on an actual multi-core machine (32 cores) and FPGA.

Three weak points of the paper (please number them W1,W2,W3)

W1: Idea is a straightforward extension to [14].
W2: Evaluation section leaves much to be desired (see detailed comments).
W3: Since the handshake join is designed to run in a multicore, there is little description of implementation issues specific to a multicore environment.

Detailed Comments (please number each point)

C1: I strongly suggest you change your title. The analogy to soccer is weak (only the handshake), and there ought to be more meaningful names you can find.
C2: The paper indicates that that for ease of presentation, the focus is on count-based windows. Given the extra space in the appendix, it will be useful to discuss time-based windows as well, especially given that time- based is being experimentally evaluated.
C3: The term "eager scan strategy" implies there is a lazy version, but in effect, your alternative is sort-merge. It will be good to use a more meaningful distinction, e.g. pipelined vs non-pipelined or simply nested loops vs sort-merge. With regards to this, why nested loops and not symmetric hash joins? Is it due to the ring buffer that you use? Some discussion of design/implementation issues specific to multicore architectures would be informative.
C4: There are several issues in evaluation section. To list a few: a. Why is hyperthreading turned off? Details appear to be omitted. b. There is no real dataset used (synthetic data) or comparison with a traditional DBMS in a single core environment. From the description, it is not even clear whether a database engine is used as a basis, or only a custom join implementation is evaluated. c. What is the impact of segment size within a core? In the experiments, they are set to a fixed size without justification.
C5: It is unclear what is meant when FPGAs are used as a simulation platform for large number of cores. Are there really large number of cores, and is the FPGA simulation reflective of a real multicore architecture?

Reviewer 2

Overall Recommendation

Accept

Reject due to technical incorrectness

No

Novelty

Medium

Technical Depth

Medium

Presentation

Adequate

Summary of the paper's main contributions and impact (up to one paragraph)

The paper proposes and evaluates a prototype implementation of a parallel stream join algorithm where sub-window partitions are joined pairwise by propagating them in opposite directions though a linear chain of connected processors. The approach parallelizes join processing without any central control. The approach is simple and straight-forward and seems quite useful with modern multi-core systems.

Three strong points of the paper (please number them S1,S2,S3)

S1: Handshake join is shown to be a both simple and efficient parallel join algorithm requiring no central coordination.
S2: Results from real implementations both on a multi-core and on PGFAs are reported.
S3: The paper is well written and easy to understand.

Three weak points of the paper (please number them W1,W2,W3)

W1: Some synchronization issues are not covered, e.g. how to deal with that two tuples arrive at the same time and to assure that the result stream has correct order. It is not discussed whether such issues could potentially slow down throughput.
W2: The size of timed windows is potentially unbounded. It is not discussed how to handle very large timed windows that could cause memory overflow and cache misses in the processors.
W3: I am missing a performance comparison with some other parallel stream join algorithm.

Detailed Comments (please number each point)

1. p 2. left col, 2nd para: It seems this work is mainly for count-based windows. You are in the text not discussion much the potential problems and solution with using this for time based windows. The paper would be stronger is you had more discussions on how to use it for the very common time-based windows.
2. p 2, left col, line -16: a multi-core system
3. p3, left col 3rd para: I think you should here outline how to deal with this race condition and whether it will slow down processing.
4. p 4, left col, sec 3.6, 1st para; Unclear how very large time-based windows are handled.

Reviewer 3

Overall Recommendation

Reject

Reject due to technical incorrectness

No

Novelty

Medium

Technical Depth

Low

Presentation

Adequate

Summary of the paper's main contributions and impact (up to one paragraph)

The notion of a handshake join is proposed in the paper. The handshake join leverages hardware parallelism to improve the performance of joins on streaming data. To demonstrate the effectiveness and efficiency of the proposed join algorithm, experiments were performed on a machine with four eight-way CPUs (i.e 32 CPU cores). In addition, field-programmable gate arrays (FPGAs) were used for evaluating the scalability of the proposed algorithm beyond 32 cores.
Due to the recent hardware advances, this work addresses a very practical problem of performing joins on streaming data using multi-cores.

Three strong points of the paper (please number them S1,S2,S3)

S1. The paper presents a novel approach for performing joins on streaming data in parallel. The key idea is that when tuples from two streams need to be joined, they "shake hands" by evaluating the join condition over the tuples. Based on this design, the authors showed that they allows the proposed join algorithm to be easily parallelized with local synchroniszation.
S2. Communication mechanisms between the multi-cores are also presented. These include the use of lock step forwarding, asynchronous message queues, two-phase forwarding and FIFO queues.
S3. Realistic experiments were performed on an Intel Nehalem EX machine, with 32 cores. In addition, FPGAs which can simulate >32 cores were used in the scalability study. The results from these studies provide good insights on the performance of stream joins on multi-cores.

Three weak points of the paper (please number them W1,W2,W3)

W1. It is not entirely clear how the handshake join ensures correctness and completeness. While there are some high-level discussions in both Section 3 and 4, it will be useful to either consolidate these discussions or provide a detailed analysis of the join algorithm.
W2. the paper lacks analytical arguments (wrt correctness and completness above.
W3. Lack of a comprehensive set of experiments. For example, experiments on varying window sizes are missing (the paper presented the results for a 5 and 10 minutes window).

Detailed Comments (please number each point)

1. In Section 4, the notion of a two-phase forwarding is introduced. From the discussion, it is not clear how the system can reach stable state as nodes pass ackowledgement messages to neighbouring nodes.
2. How does handshake join handle uneven data arrival rates?
3. In Section 5, What is the data distribution for the data that are used during the join? How is the data generated?
4. The claim the the proposed algorithm outperforms CellJoin is not substantiated. In Section 5, the experiments on the evalution between Handshake Join and CellJoin are missing.
5. In Sectiom 6, it is noted that the tuple-based windows will fit into on-chip memory. It will be useful to provide the size of the on-chip memory.
6. What happens if the join processes more than two streams? How would "handshake" occur?
7. Can you discuss the issue of out-of-order data?

List specific clarifications you seek from the Authors (if you have answered "Yes" to Q. 6) Use this space to respond to author feedback too.

This is a weak reject.

Related Information



Nebeninhalt

Kontakt

Prof. Dr. Jens Teubner
Tel.: 0231 755-6481