Jump label

Service navigation

Main navigation

You are here:

Main content

Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware — ICDE 2013 Reviews

Reviews for paper Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware, submitted to ICDE 2013.

Overall rating: accept

Reviewer 1

Brief summary of the paper and a concise rationale for your recommendation

This paper solves an important controversy raised by [2] that came on the heels of [1], which came with a fully contradictory message. By exploring the parameter space in depth, it shows that the results of [2] only holds when the join is between a small and a large table and the join keys are skewed -- in all other cases the conclusions of [1] take precedence over [2].

Major strengths (explain the nature and value of the paper's contributions)

- Incomplete results like in [2] put both practitioners and researchers in a state of confusion that can easily make them lose a lot of time to reach the conclusions of this paper (and if these unlucky researchers do not have the engineering depth of experience as displayed here, they might remain confused). Hence such clarifying work is quite appreciated.

- excellent presentation

- convincing evaluation

Major weaknesses (indicate any limitations of the paper, notably technical errors, missing related work, and recycled results)

- some reviewers will say that this paper does not introduce a new algorithm and lacks novelty. However, for me clearing the confusion raised by [1] vs. [2] is a strong contribution to the community.

Intellectual novelty (give High to papers on new topics or proposing stimulating new ideas; Medium to papers on well known topics, but making useful contributions; Low to incremental papers, or papers with known ideas)

Medium

Technical depth (assess the depth of the contributions)

High

Presentation quality (assess the quality of the presentation of the contributions?)

High

Detailed comments

This is very well done and sound work; there are little details to complain about/remark on.

III B. "AMD machine" -> The AMD machine

IV C. phsysical

V C. "Resulting overall" -> The resulting overall

the key observations are
- function-call intensiveness (relative inefficiency versus the other implementations) of radix join in [2]
- poor radix join implementation choice in [2] to build all hash tables first before probing them
- sorted insert order and effective non-hashing in [2] that cause a sequential access pattern in hash build
- problem configuration [2] happened to hit its sweet spot combining small build size and skewed key distributions

in all you showed that [1] was right, and [2] either made some very unfortunate mistakes -- or, more likely, pulled all the tricks to sell a story that does not hold up to scrutiny.

Reviewer confidence

High

Overall recommendation

Accept (I will argue for acceptance)

Reviewer 2

Brief summary of the paper and a concise rationale for your recommendation

The authors report on a thorough experimental analysis of hash-join algorithms on multicore systems. The conclusion is that even in this contraint setting, there can be observable performance differences in number of cpu cycles spent per tuple.

Major strengths (explain the nature and value of the paper's contributions)

- very detailed experimental study
- readable story
- good reference case for micro experimentation.

Major weaknesses (indicate any limitations of the paper, notably technical errors, missing related work, and recycled results)

- lack of algorithmic novelty
- lack of translation into real world performance
- lack of stability under skew analysis

Intellectual novelty (give High to papers on new topics or proposing stimulating new ideas; Medium to papers on well known topics, but making useful contributions; Low to incremental papers, or papers with known ideas)

Low

Technical depth (assess the depth of the contributions)

High

Presentation quality (assess the quality of the presentation of the contributions?)

Acceptable

Detailed comments

The paper presents a thorough analysis of hash-join algorithms on a multicore system. The main weakness is the lack of an analysis of the macro-visible effects. How much of a factor two on cycles/output element translate into observable better query response times. Integration in an open-source system would have made this clear.

Furthermore, the stability under data skew has not been studied. Therefore, the overall contribution remains isolated.

Reviewer confidence

High

Overall recommendation

Accept (I will argue for acceptance)

Reviewer 3

Brief summary of the paper and a concise rationale for your recommendation

The paper performs a very careful evaluation of different hash join strategies and highlights the different sweet spots. Primarily, it is a falsification paper for [2]. What is a bit missing is the "tuning to the underlying hardware" part from the title. The authors give some general hints, but not really a roadmap.

Major strengths (explain the nature and value of the paper's contributions)

- a very careful evaluation of different join strategies
- highlights systematically weaknesses in the setup of [2]

Major weaknesses (indicate any limitations of the paper, notably technical errors, missing related work, and recycled results)

- the novelty is a bit limited. Most of the results were known (though not properly published, so that might be ok)
- the paper does not really over a guideline for tuning to the underlying hardware

Intellectual novelty (give High to papers on new topics or proposing stimulating new ideas; Medium to papers on well known topics, but making useful contributions; Low to incremental papers, or papers with known ideas)

Medium

Technical depth (assess the depth of the contributions)

Medium

Presentation quality (assess the quality of the presentation of the contributions?)

High

Detailed comments

The paper performs a very careful analysis of different hash join strategies, and I really appreciate the effort. Some algorithms work well only for certain setups, and this paper makes a good point of emphasizing this issue. It primarily falsifies [2], and shows that the results of [2] are artifacts of the the experimental setup in [2]. This is useful work, and reminds us to be honest in our setup and to avoid measuring special cases like sorted input.

What is a bit missing is the "tuning" part of the title. How should we tune the algorithms to the underlying hardware? The paper gives some hints, for example in Figure 9, but the question remains largely unanswered. For example while the no-partition strategy is not competitive in general, it is clearly a good idea if the build relation is very small. Can we quantify "very small"? Can we detect this at execution time? At compile time? Or what about the hardware itself? Section VI-B mentions that the AMD processor might require a different data flow/access pattern than the Intel processor. Can we develop building blocks such that the appropriate algorithm can be "assembled" according to the available hardware?

Of course these are hard questions, and one cannot reasonably expect to get an answer to all of them in one paper. But the current paper neglects this issue a bit. It primarily compares a particular Manegold-style radix join to the non-partitioning hash join and gives numbers, without spending too much thoughts on the tuning itself.

Reviewer confidence

Medium

Overall recommendation

Accept (I will argue for acceptance)

Related Information



Sub content

Contact

Prof. Dr. Jens Teubner
Tel.: 0231 755-6481