Reviews (DaMoN 2023)

Paper “Towards Data-Based Cache Optimization of B+-Trees”

Reviews (DaMoN 2023)

Reviewer #1

Is the topic of the paper relevant to and important to the DaMoN community?

Yes

Is the paper readable and well organized?

Mostly - the presentation has minor issues, but is acceptable

Are the research contributions substantial and novel enough to warrant acceptance?

Mostly - the contributions are above the bar

Are the paper's methodology, assumptions, models, and arguments free of serious flaws?

Mostly so - there are some minor issues, but none of them are fatal

Overall rating. Papers with Reject or Strong Reject ratings should have at least one negative score on Q1-Q4.

Accept

Short justification of your Overall Rating.

This paper investigates the problem of optimizing in-memory B+-Trees through proper memory layout. By calculating the visiting probability and frequency of tree nodes, the authors propose an optimized B+-tree layout that identifies node priority and organizes memory accordingly. Experiments are conducted to demonstrate the effectiveness of the proposed methods.

Detailed commments

Overall, this paper addresses a real problem, and the solution is technical sound. However, there exist some limitations:

D1: How about the complexity of the layout construction algorithm? Besides, the algorithm seems straightforward, as it merely counts the access times of each node and place frequently accessed nodes before other nodes.

D2: Some analyses are based on results that are not presented. For example, in Section 5.2, the authors state: "With more than 40 threads, a further improvement in performance can be observed." However, the throughput with more than 40 threads is not shown. I recommend including the missing results in the figures.

D3: In term of handling tree splits or rebalancing, it is unclear if the modified physical tree layout introduces any differences or improvements.

If this is a full paper and it is rejected, would you support its acceptance as a short (2 page) paper?

Yes

Reviewer #2

Is the topic of the paper relevant to and important to the DaMoN community?

Yes

Is the paper readable and well organized?

Definitely - very clear

Are the research contributions substantial and novel enough to warrant acceptance?

No - the contributions in this version do not warrant acceptance

Are the paper's methodology, assumptions, models, and arguments free of serious flaws?

No - there are serious flaws that undermine the results

Overall rating. Papers with Reject or Strong Reject ratings should have at least one negative score on Q1-Q4.

Reject

Short justification of your Overall Rating.

The paper presents a memory layout optimization for B+ trees in memory by finding the hottest access paths and moving these B+ nodes in nearby memory locations. While the idea is good, the experimentation using an ad-hoc Zipfian distribution is very limited.

Detailed commments

Listing strong (S1, S2, ...) and weak (W1, W2, ...) points.

S1. The authors describe a good overview of existing work in B+-trees for in-memory and propose a generally good idea of memory layout optimization to try to limit the TLB footprint.

S2. The authors compare a number of different configurations and even make a couple of modifications to the node-lookup algorithm in order to show a wider range of experiments.

W1. The data used here are entirely synthetic and follow a single Zipfian distribution for their lookups on top of a toy 8-byte key/8-byte payload setup. This is very limiting and as such the conclusions drawn from this paper may be artifacts of this particular distribution. I also think this distribution of 0.99 is a bit extreme.

W2. The idea of memory layout optimization is not bad but it's not clear how can someone move the data around safely. I assume you could create copies of the tree nodes but if there are concurrent updates, I am not sure this is as easy, especially if the node keys are re-arranged.

W3. There is mention or comparisons with other state-of-the-art in-memory tree indexes such as ART (The Adaptive Radix tree). I think the authors need to show that B+ trees are still valuable in comparison to newer indexes that showed to be significantly faster than cache-conscious B+ trees.

If this is a full paper and it is rejected, would you support its acceptance as a short (2 page) paper?

No

Reviewer #3

Is the topic of the paper relevant to and important to the DaMoN community?

Yes

Is the paper readable and well organized?

Definitely - very clear

Are the research contributions substantial and novel enough to warrant acceptance?

Mostly - the contributions are above the bar

Are the paper's methodology, assumptions, models, and arguments free of serious flaws?

Yes - to the best of my understanding

Overall rating. Papers with Reject or Strong Reject ratings should have at least one negative score on Q1-Q4.

Strong Accept

Short justification of your Overall Rating.

This well-written paper describes a method for improving cache locality of in-memory B+ trees by using a simple yet elegant idea involving a probabilistic model for hot paths in the B+ tree. It convincingly shows that the method improves throughput over state-of-the-art in-memory B+ trees by up to 26% when combined with a smaller node size and linear scanning of nodes.

Detailed commments

This paper describes a method for improving cache locality of in-memory B+ trees by using a probabilistic model to determine the most popular access paths in the B+ tree. It shows that the method improves over state-of-the-art in-memory B+ trees by up to 26% when combined with a smaller node size and linear scanning of nodes.

D1. The idea of building a probabilistic model to identify hot paths in the B+ tree is simple yet elegant and effective.

D2. The evaluation nicely shows how each optimization (hot paths, linear scan, sorting, and node size) impacts the throughput and cache misses. It convincingly shows the benefit of this approach over the state-of-the-art.

D3. The paper is well written and very easy to follow. The figures are helpful in understanding the concepts.

D4. In my opinion, the biggest opportunity for improvement in this paper is to show that these methods are applicable to real (or realistic) DB workloads (in addition to the future work mentioned by the authors of applying these methods dynamically at runtime). Showing the performance of a standard benchmark such as TPC-C or YCSB would make the results more convincing.

D5. One thing that's not fully clear is whether the "native" B+ tree implementation that is used as a baseline includes the cache-friendly optimizations in [20] that are mentioned in Section 2. If not, [20] should probably be another baseline for comparison in the evaluation.

D6. Section 5.1 uses "Mio" as an abbreviation for million. As far as I am aware, this is uncommon and could be confusing to some readers. "M" or "Mil" seems more common.

If this is a full paper and it is rejected, would you support its acceptance as a short (2 page) paper?

Yes