Jump label

Service navigation

Main navigation

You are here:

Main content

How Soccer Players Would Do Stream Joins — SIGMOD 2011 Author Feedback

Author feedback for paper How Soccer Players Would Do Stream Joins, submitted to SIGMOD 2011.

Rev. 4, D2: We did implement handshake join on the FPGA. But the implementation was geared to play with different degrees of parallelism, not on performance or practical applicability. The motivation was to judge how a future many-core system might behave, hence the term "simulation".

Rev. 1, W1: We think there is a misunderstanding how the two join algorithms work:

  • For CellJoin, when new data arrives from stream X, that data will be replicated to all SPEs, while Y will be partitioned. But this behavior is symmetric. When new Y-data arrives, it is replicated to all SPEs and X is partitioned. Effectively, all data is moved (better: "assigned", see next point) to each core at least once during an algorithm run.
  • Neither algorithm can physically keep partitions inside the cores (local stores or caches are way to small for that), but have to repeatedly re-fetch their data from memory. The difference is that handshake join keeps those fetches NUMA-local, while CellJoin depends on a single uniform memory (which has scalability limits, see below).

We will make handshake join's memory access characteristics more explicit in the final text (see also below).

Rev. 1, W3: In CellJoin, the PPE-SPE link is only used for control messages. Data is pulled by the individual SPEs from a centralized memory. It is known that such system designs cannot be scaled to large core counts. This is a key motivation of handshake join.

Rev. 1, W4/D1: We chose CellJoin as a reference because it defines the state of the art in stream join evaluation on multi-core systems. As you point out, performance depends on many different factors. However, in a rapidly changing hardware landscape, system comparisons are hard to do in an entirely fair manner. The given performance numbers were meant to position both solutions in the performance space, not to give a comprehensive evaluation of the two approaches. We will make this clearer in the text and also provide better explanations of the observed behavior (e.g., discuss different memory access characteristics; this also addresses (5) of Reviewer 2).

Rev. 1, D2: Handshake join has a partitioning overhead that is independent of the core count N by design, i.e., O(1). For CellJoin this overhead is O(N). In [9], the authors assume that "since N is small (8 for a single Cell processor) [...] partitioning has minimal overhead".

Rev. 3, D1: We are well aware of cyclo-join (and already cited [12] from the Data Cyclotron project). Our work indeed shares ideas with cyclo-join, most importantly the concept of data flow orientation (which is what our citation already refers to) and a communication model that is restricted to neighbor-to-neighbor communication. An obvious difference is the implementation setting based on RDMA (though handshake join would fit such a setting, too; see our Section 8). What is more interesting is that in handshake join both data streams are moving within their join windows (while cyclo-join keeps one relation stationary). This leads to the interplay of windowed join semantics and architecture consciousness that makes handshake join so appealing. We are going to discuss this with more details in the final version of the paper (and of course cite cyclo-join appropriately).

Related Information

Sub content


Prof. Dr. Jens Teubner
Tel.: 0231 755-6481