Zum Inhalt
Fakultät für Informatik
Paper “How to Be Fast and Not Furious: Looking Under the Hood of CPU Cache Prefetching”

Reviews (DaMoN 2024)

Reviewer #1

1. Is the topic of the paper relevant to and important to the DaMoN community?

Yes

2. Is the paper readable and well organized?

Mostly - the presentation has minor issues, but is acceptable

3. Are the research contributions substantial and novel enough to warrant acceptance?

Definitely - a significant advancement

4. Are the paper's methodology, assumptions, models, and arguments free of serious flaws?

Yes - to the best of my understanding

5. Overall rating. Papers with Reject or Strong Reject ratings should have at least one negative score on Q1-Q4.

Accept

6. Short justification of your Overall Rating.

This paper provides a comprehensive study of software prefetching across various hardware platforms. Authors characterize impact of different parameters, compare related instructions and validate their conclusion from low level testing on the higher level workload.

7. Detailed commments

I really enjoyed reading this paper that has demystified prefetching instructions to a large extent and showed both what is possible today and where are opportunities for improvement.

A few aspects of the paper left me a bit disappointed, first of all size of the figures and especially fonts around graphs. They are impossible to read when paper is printed out and looking at them on screen required zooming in quite a bit. This is really unfortunate because layout of the figures and choice of colors and style makes them easy to read and understand. I especially like figure 7 that packs a lot of valuable information into a single plot.

The other group of opportunities for improvement are related to the experimental setup details. Firstly, while figure 1 nicely illustrates the point, it is not clear how the experiment is caried out. I would have expected more details in one of the later sections, but it was not included anywhere. Also, ARM processors, especially the ones designed for public cloud, have made a lot of improvements recently, so it would be great to include representative few on figure 1. Secondly, hardware setup is specified in a fuzzy way in section 3 which initially confused me. I would appreciate it if the experimental setup was described in a separate paragraph to make it very clear. Finally, even though I really liked comprehensive benchmark in section 4, I was puzzled by what exact configuration is used as a baseline in the experiments. It would be good to clarify on the figure and in the text.

8. If this is a full paper and it is rejected, would you support its acceptance as a short (2 page) paper?

Yes

Reviewer #2

1. Is the topic of the paper relevant to and important to the DaMoN community?

Yes

2. Is the paper readable and well organized?

Definitely - very clear

3. Are the research contributions substantial and novel enough to warrant acceptance?

Definitely - a significant advancement

4. Are the paper's methodology, assumptions, models, and arguments free of serious flaws?

Yes - to the best of my understanding

5. Overall rating. Papers with Reject or Strong Reject ratings should have at least one negative score on Q1-Q4.

Strong Accept

6. Short justification of your Overall Rating.

This is an excellent paper both in structure as well as the content. The authors do a great job of introducing the concepts, diving deep into the software prefetcher, teasing out how these interact with hardware prefetchers, and exploring opportunities for enhancements. The micro-benchmark results introduced in section 3 provide very pointed insights and the experimental work in section 4 does a great job of reinforcing the points made in the earlier sections.

7. Detailed commments

No additional comments.

8. If this is a full paper and it is rejected, would you support its acceptance as a short (2 page) paper?

No

Reviewer #3

1. Is the topic of the paper relevant to and important to the DaMoN community?

Yes

2. Is the paper readable and well organized?

Definitely - very clear

3. Are the research contributions substantial and novel enough to warrant acceptance?

Mostly - the contributions are above the bar

4. Are the paper's methodology, assumptions, models, and arguments free of serious flaws?

Yes - to the best of my understanding

5. Overall rating. Papers with Reject or Strong Reject ratings should have at least one negative score on Q1-Q4.

Accept

6. Short justification of your Overall Rating.

The paper does a deep dive analysis on software prefetching explaining the hardware characteristics and limitations as well as common misconceptions. There are various examples in the paper the most notable of which is B+-index page prefetching and the impact of the node size. The topic is extremely relevant and important and while this paper only explores a very specific set of use cases, it is definitely a worthy read.

7. Detailed commments

Strong and weak points outlined below:

S1. The authors explain in good detail how hardware prefetching works, most notably which parts remain synchronous (TLB misses) as well as the capacity constraints. The connect these observations to experiments varying the page size (for TLB) or B+ tree node size (for capacity).

S2. The idea of using software prefetching to "train" hardware prefetchers is good (the use of the word train here is highly generous though) and promising although it is very much CPU-dependent. The trade-offs are worth exploring further.

W1. The authors use a rather simple example of prefetching, namely the prefetching of B+-index nodes. While this is the most crucial since OLTP workloads are latency-dependent, there are cases where prefetching is used to increase throughput such as for hash table probing. One can go further and use software prefetching in graph traversals etc. The authors could at least discuss such cases.

8. If this is a full paper and it is rejected, would you support its acceptance as a short (2 page) paper?

Yes