Jump label

Service navigation

Main navigation

You are here:

Main content

How Soccer Players Would Do Stream Joins — SIGMOD 2011 Reviewer Comments during Feedback Phase

Reviewer comments for paper How Soccer Players Would Do Stream Joins during the feedback phase of SIGMOD 2011.

Reviewer 1

Summary of the paper's main contributions and impact

The paper proposes handshake join, a new approach to partition a window-based stream join to multiple cores for parallel execution. In the proposed approach, the two input streams flow by each other in opposite directions, so the join predicate can be evaluated between two tuples of the two streams in parallel. The approach is implemented and experimentally evaluated on a 48-core machine and on FPGAs.

Strong points of the paper (please number them S1,S2,..)

S1. The design of the handshake join is novel: unlike previous approaches, where the two streams are processed in the same "direction" and parallel execution relies on a centralized coordinator to partition and replicate the stream to different nodes, in this approach, the streams come in opposite directions and hence parallelism can be achieved naturally without the need for a centralized coordinator.
S2. Different communication topology: a node needs to communicate only with its left and right neighbors. Therefore it might be a good choice in a system where efficient data movement between one node (coordinator) and all the other nodes is not available (such as the one the author used for experiments).
S3. Asynchronous message queues release the need for a global synchronization when shifting data between nodes. The implementation nicely allows for load balancing.
S4. There is a size-able experimental sensitivity analysis of the handshake-join in different architectures.

Weak points of the paper (please number them W1,W2,...)

W1. Handshake join requires more data movement than the previous partitioning and replicating scheme of CellJoin. In CellJoin the tuples in one of the streams have to be copied to all nodes, but those in the other stream are partitioned to only one of the nodes; in handshake join every tuple of both streams has to be passed through all nodes (although it is more efficient in terms of memory usage, since no replication is needed).
W2. Handshake join causes local disorder of the result tuples.
W3. Handshake join communication topology might not be an appropriate choice for a hardware configuration that only supports fast data movement between one special node and all other nodes (like the Cell processor used by CellJoin).
W4. The experimental evaluation (as far as comparing with the current state-of-the-art, CellJoin) is at best unscientific and is the main reason for the low score of this paper, in this reviewer's opinion (see detailed comments).

Detailed comments (please number each point)

D1. Experimental results show that handshake join scales well with large number of cores. However, my major concern is the comparison of handshake join and CellJoin. Although the paper tries to use the same workload, it is hard to make any conclusion, since the two schemes are deployed on two completely different systems/architectures, and both use hardware-based optimization.
The paper only provides experimental results of handshake join, and compares them with the results presented in the CellJoin paper, i.e., just by using the numbers from a previously published paper on a completely different system! This is a completely inappropriate comparison. Although the system used by handshake join has a lower clock speed, it is well-known that there are many other factors that determine performance (e.g., bus bandwidth, interconnection network, cache size, memory access speed, etc). As such, the claim that handshake join "substantially outperforms CellJoin" is completely inappropriate.
As mentioned in W1, the handshake join actually requires more data movement, and it also brings additional overhead in handling the asynchronous queues, so whether it performs better than CellJoin in the same system is not obvious. In fact, if one considers the fact that the Magna Cours architecture used by this paper was only available in March 2010 (http://en.wikipedia.org/wiki/Opteron), whereas the Celljoin paper was *submitted* to the VLDB Journal in February 2008, the machine used for this paper is over two years more recent than the CellJoin one.
D2. The experiments in the CellJoin paper show that it scales perfectly for 8-core, and experimental results of CellJoin in a system of more cores are not available, so practically we cannot claim that handshake join scales better than CellJoin. It would be great to have both CellJoin and handshake join implemented in the same system, so that we could easily compare their performance and scalability
D3. The numbering of the algorithm in Figure 13 is confusing.

Is author feedback necessary for this Review? (If "Yes", please answer next question.)

No

Reviewer 2

Summary of the paper's main contributions and impact

Main Contributions:
The authors propose handshake join - a paradigm for window-based stream joins which naturally maps to parallel execution and trivially scales to high degrees of parallelism, yet simple in its conceptual realization.
1) Data-flow oriented formulation of window-based stream joins
The main contribution of this paper is a data-flow oriented view on the enumeration of the Cartesian product instead of taking the traditional procedural route (nested-loop like). Analogous to the handshakes taking place prior to the beginning of the soccer game, two streams flowing in opposite directions can form the enumeration of join candidates. An adequate formal definition of the implied semantics is provided.
2) Effective parallelization of stream joins
The data-flow oriented formulation simplifies the parallelization of stream joins. By partitioning both stream windows into disjoint segments, the partial joins can be performed in a data-driven fashion without the need for a central organization unit. The enumeration of “local” Cartesian products and filtering is realized by Kangs three-step procedure, however other strategies are applicable.
3) Technical realizations for next-generation hardware
On the practical side, detailed realizations are provided which address technical aspects such as inter-core communication protocols, load balancing, SIMD optimization, power consumption. Handshake join is experimentally evaluated on next-generation multi-core (performance) and FGPA hardware (scale out).
Impact:
With the recent advent of multi-core and many-core architectures, the need for effective and simple concurrent methodologies has intensified. A paradigm shift from procedural to data-driven thinking is essential. The proposed parallelization strategy is effective yet simple and may inspire further research in the field of parallel computing.

Strong points of the paper (please number them S1,S2,..)

S1: Beautiful exploitation of analogy between soccer games handshaking and join candidate enumeration
S2: Evaluation on next-generation multi-core CPU and FGPA hardware
S3: Potential implications for ongoing research and standardization (streaming SQL)

Weak points of the paper (please number them W1,W2,...)

W1: Limited novelty (the theoretical part of handshake join and it’s parallelization is flawless, but novelty after chapter 3.4 is very limited and filled with technical [sometimes almost trivial] parts)
W2: Redundant parts (immediate scan strategy, two-phase, forwarding, see below)
W3: Limited comparative experimental evaluation (partition/replicate versions of Kang or others)

Detailed comments (please number each point)

(2.2)
+ precise recapitulation of Kang’s three-step procedure
+ explicit formal definition of semantics is useful

(3.1)
+ beautiful analogy to hand shake ritual in soccer games
(3.3)
+ isomorphic character of handshake and three-step procedure based on semantic is sound
(3.4)
+ good comparison of data-flow vs. control-flow problem
+ good explanation why data-flow oriented view simplifies parallelization of cartesian product / join
(3.5)
- introducing Kang’s procedure as “intermediate scan stratetgy” and possible optimization techniques is trivial/redundant
- figure 6 is trivial and symmetry is explicitly stated in (2.2)
- investigation of an alternative stream join implementation would fill the space far better

(4)
+ providing two distinct handshake join realization for completely different hardware architectures is remarkable
(4.2)
- the two-phase forwarding is (almost) trivial or (more precisely) highly redundant (figure 8, two-phase forwarding paragraph, figure 9 and corresponding paragraph)
(4.3)
- no experimental evaluation of “[...], which in practice we found sufficient to achieve good balancing”?

(5)
+ appropriate setup
- weak experimental evaluation, why does handshake join on magny cours outperform celljoin?

(6)
+ remarkable setup (again)
- indeed, the (extensive) evaluation of power consumption is insightful, but detailed multi-core experiments seem to be more appropriate

Is author feedback necessary for this Review? (If "Yes", please answer next question.)

No

List specific clarifications you seek from the Authors (if you have answered "Yes" to need for feedback.)

REPLACE THIS WITH YOUR ANSWER

Reviewer 3

Summary of the paper's main contributions and impact

The authors propose a new windowed join technique which exhibits embarrassing parallelism and point-to-point data flow. Adding resources allows either larger windows, more complex processing, or a combination of both. Further, the technique is more a framework than a particular implementation, accommodating a variety of hardware (CPU, FPGA, potentially GPU) and local join algorithms.

Strong points of the paper (please number them S1,S2,..)

S1 - The idea is simple and elegant (and embarrassingly parallel)
S2 - The evaluation shows good results with both FPGA and CPU hardware
S3 - The paper is very clearly written and easy to follow

Weak points of the paper (please number them W1,W2,...)

W1 - The principle is similar that used in the cyclo-join (DaMoN'09), which is not cited. A comparison of the two techniques seems warranted.
W2 - A more thorough evaluation of load balancing and partitioning techniques (especially on CPU) would strengthen the paper's claims in those areas (though the claims are intuitively strong).

Detailed comments (please number each point)

D1 - The cyclo-join, though proposed for a different purpose and in a different domain, has a very similar mode of operation. If a handshake join were performed on a networked cluster of computers, would the two algorithms look any different from each other?

Is author feedback necessary for this Review? (If "Yes", please answer next question.)

No

Reviewer 4

Summary of the paper's main contributions and impact

This paper proposes handshake join, a novel processing algorithm for window-based stream joins. The main idea is that, instead of processing two incoming data streams side by side along the same direction, they let the two data streams flow by in opposite directions and to join within the window somewhere in the middle. The analogy of this processing algorithm is that two teams of football players walk into a stadium from two corners and shake hands in the center region of the stadium. Moreover, the paper parallelizes this algorithm on multi-core architectures by partitioning the data flow onto individual cores. It presents efficient implementations on two multi-core platforms, an AMD 48-core server and an FPGA chip of up to 200 instantiated cores. The experimental results confirm the efficiency and scalability of the proposed approach.

Strong points of the paper (please number them S1,S2,..)

S1. The proposed handshake join algorithm is novel in that it lets the two data streams flow by each other in opposite directions. The main advantages of this design is its data parallelism and localized communication pattern.
S2. The paper gives a clear semantics of the handshake join algorithm and prooves its equivalence to the semantics of traditional window-based stream joins.
S3. The paper presents efficient implementations on two multi-core platforms, and the evaluation results are convincing.
S4. The paper is a pleasure to read, illustrating the overall idea intuitively and presenting the algorithm and implementation in sufficient detail.

Weak points of the paper (please number them W1,W2,...)

No significant weaknesses.

Detailed comments (please number each point)

D1. While the point-to-point communication pattern of the handshake join algorithm fits a number of multi-core architectures well, I wonder how it works for current GPUs, where inter-processor communication is limited to processors within each multiprocessor through a small piece of local memory. It would be interesting to see a GPU-based implementation.
D2. The first paragraph of Section 6 says the FPGA is used as a simulation platform. It is somewhat confusing: is the algorithm actually implemented on the FPGA, or it is just a simulation?

Is author feedback necessary for this Review? (If "Yes", please answer next question.)

Yes

List specific clarifications you seek from the Authors (if you have answered "Yes" to need for feedback.)

See detailed comments D2

Related Information



Sub content

Contact

Prof. Dr. Jens Teubner
Tel.: 0231 755-6481