Jump label

Service navigation

Main navigation

You are here:

Main content

Spinning Relations: High-Speed Networks for Distributed Join Processing — DaMoN 2009 Reviews

Reviews for paper Spinning Relations: High-Speed Networks for Distributed Join Processing, submitted to DaMoN 2009.

Overall rating: accept

Reviewer 1

Is the paper relevant for the DaMoN workshop?

Yes

Is the paper technically correct?

Yes

Originality

Weak Accept

Impact

Weak Accept

Technical Depth

Weak Accept

Presentation

Weak Accept

Potential to stimulate interesting discussions during the workshop

Weak Accept

Overall rating

Weak Accept

Reviewer confidence

High

Should this paper be considered for the Best Paper Award?

No

Summary of main contribution and rationale for your recommendation (1-2 paragraphs)

The paper advocates an algorithm that leverages RDMA connections to join two relations, essentially by keeping all temporary/partition files circulating in the network instead of being stored on the disk.

Detailed comments to authors

I likes this paper because there are interesting ideas in it, however my "weak" rating is because (a) the presentation is weak -- the paper seems to ave beenwritten a trifle hastily, especially in the introduction and (b) I am not convinced that there will be adequate opportunity for application of these algorithms. In which setting will you apply these algorithms, and how will they perform int he presence of many queries? how will work be distributed when you have many rings involved with one another? Bottlenecks will be hard to predict, let alone resolve.

Reviewer 2

Is the paper relevant for the DaMoN workshop?

Yes

Is the paper technically correct?

Yes

Originality

Accept

Impact

Accept

Technical Depth

Accept

Presentation

Accept

Potential to stimulate interesting discussions during the workshop

Accept

Overall rating

Accept

Reviewer confidence

Low

Should this paper be considered for the Best Paper Award?

Yes

Summary of main contribution and rationale for your recommendation (1-2 paragraphs)

A simple and elegant algorithm is presented for distributed hash-join. The key observation made in this work is that the bandwidth of modern network connections far exceeds that of modern disks. The authors suggest to leverage the network links to process a very large dataset that is distributed across several nodes. The main idea is to have every node first process the local chunks of the two relations R and S. The chunk of S is then send off to another node in a cyclical fashion (all nodes are connected into a ring). Since every node sends it's local chunk of S to another node, and since every node also passes along every remote copy of S it receives, eventually all nodes will see all of S.

I found the paper easy to read and to the point (I spotted one typo "the this delay"). Being outside the area I valued how the authors concisely described the problem, background and solution. I'm not necessarily in a position to really now whether this work is novel, but it looked novel to me.

Detailed comments to authors

None really worth mentioning

Reviewer 3

Is the paper relevant for the DaMoN workshop?

Yes

Is the paper technically correct?

Yes

Originality

Neutral

Impact

Neutral

Technical Depth

Neutral

Presentation

Accept

Potential to stimulate interesting discussions during the workshop

Weak Accept

Overall rating

Weak Accept

Reviewer confidence

Medium

Should this paper be considered for the Best Paper Award?

No

Summary of main contribution and rationale for your recommendation (1-2 paragraphs)

This paper proposes a parallel communication architecture, and a parallel implementation of joins that exploits this architecture. The architecture is a distributed-memory architecture, and it uses RDMA channels to perform memory to memory data transfers efficiently between processors. Processors are organized logically into a ring. A host processor sends a copy of one relation to all the processors. The other relation is partitioned between the processors. Each processor performs a local join, and then (i) sends its portion of the second relation on to the next processor in the ring, and (ii) receives another portion of the second relation from the previous processor in the ring. In this way, the processors act like a circular shift register, and perform the join in parallel. An architecture with these characteristics has been built, and the paper presents time measurements for performing joins of different sizes.
There are a lot of connections to systolic algorithms, which were explored by the parallel programming community in the 80's and 90's (see comments below). The paper needs to cite this work, and I believe even better algorithms for join can be developed using systolic ideas, as explained below.

Detailed comments to authors

In the 80's and 90's, there was a lot of interest in the parallel programming community on "systolic algorithms". Unlike most parallel programming models in which communication is given less prominence than computation, systolic algorithms explicitly orchestrate the movement of data between processors, and organize computations about data movement (see for example Cannon's algorithm for matrix multiplication). The authors have essentially proposed a systolic algorithm for joins in databases, although they seem to be unaware of the connection, and do not cite any of the papers on systolic algorithms. I believe it should be possible to perform joins in a parallel manner similar to Cannon's algorithm for matrix multiplication, which might be more efficient than the implementation proposed in this paper.

Related Information



Sub content

Contact

Prof. Dr. Jens Teubner
Tel.: 0231 755-6481