Zum Inhalt
Fakultät für Informatik

VLDB 2020 Reviews

Reviews for Paper MxTasks: How to Make Efficient Synchronization and Prefetching Easy, submitted to VLDB 2021 (Research Track).

Overall Rating: Reject

Reviewer #1

1. Overall Rating

Weak Accept


2. Relevant for PVLDB

Yes


3. Are there specific revisions that could raise your overall rating?

No


5. Summary of the paper (what is being proposed and in what context) and a brief justification of your overall recommendation. One solid paragraph.

The paper proposes MxTasking, a task-based framework that assists the design of latch-free and parallel data structures using the task model as opposed to the thread model. MxTasking aims to simplify the life of the developer building parallel applications. The basic abstraction of MxTasking is the MxTask which is a short program sequence (task) that performs a single, small unit of work. The true strength of MxTasking is that it can attach task-annotations to every MxTask. With these annotations, applications may convey characteristics (e.g. runtime, data objects, scheduling priorities) of a task to MxTasking, which in turn allows to easily transfer knowledge from application level to the control-flow abstraction. Using the knowledge from task-annotation, prefetching memory contents to CPU caches becomes extremely straight-forward. The authors show the benefits of prefetching in their evaluation. In addition, tasks provide a fine-grained granularity for building parallel software. The authors mention that the fine granularity of tasks in combination with annotated metadata about the accessed data object, access pattern, and synchronization requirements provide adequate information for MxTasking to synchronize competing tasks. The authors also presented their vision of MxKernel which is a bare-metal runtime for DBMS/OS Co-Design. MxKernel provides a thin layer for both DBMS and OS running on top. This way, mutually needed data structures can be shared without any need for the DBMS to bypass the OS. Although task-based frameworks have been studied before, this work extends the spectrum by task-annotations. This combination of scheduling and synchronization to avoid latches is unique and have not been explored before. The authors demonstrate the benefit of MxTasking by building a task-based Blink-tree. Some issues like motivation of the work & background should be more detailed. In addition, the experimental evaluation should be further strengthened. Details can be found below.


6. Three (or more) strong points about the paper. Please be precise and explicit; clearly explain the value and nature of the contribution.

S1. Annotation-Based Memory Prefetching is one of the key strengths of MxTasking which is reflected in the results as well. In task-based environment, efficient prefetching is not an intricate task. The authors exploited this and illustrated their benefit.

S2. One of the key strength of MxTasking is how it avoids latches: using both scheduling and synchronization.

S3. The use of a multi-level allocator for task allocation (Section 5.2) is a great idea.

S4. Section 6 is a key strength of the paper. The authors proposes their vision of MxKernel a bare-metal runtime for DBMS/OS co-design.

 

7. Three (or more) weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered a contribution; write it so that the authors can understand what is seen as negative.

W1. The authors should emphasize more on the importance/motivation of their work. To this end, Section 1 should be more detailed. At this point, many statements are strictly superficial, which should be more elaborated.

W2. More details about the implementation of task-based Blink-tree should be added. The authors should present the basics of Blink-tree first. How updates are managed in such a task-based implementation is missing at this point. The goal of section 5 should be to encourage future practitioners/researchers to implement such task-based systems (or use this system!). Hence, more details (e.g. challenges, problems, solution idea) should be added.

W3. The experimental evaluation can be further strengthened. Please see the detailed evaluation.

W4. Does MxTasking do anything specific about load balancing? Is it always guaranteed?


8. Novelty. Please give a high novelty ranking to papers on new topics, opening new fields, or proposing truly new ideas; assign medium ratings to delta papers and papers on well-known topics but still with some valuable contribution.

Novel


9. Significance

Improvement over existing work


10. Technical Depth and Quality of Content

Solid work


11. Experiments

OK, but certain claims are not covered by the experiments


12. Presentation

Reasonable: improvements needed


13. Detailed Evaluation (Contribution, Pros/Cons, Errors); please number each point and please provide as constructive feedback as possible.

D1. In section 1, some backgrounds regarding the latch-free systems, why are they important, why existing task-based systems are cumbersome should be added.

D2. How does MxTasking support load balancing? In general, is load balancing out of the scope of this paper? If not, the authors should mention their load-balancing techniques.

D3. In the implementation of Blink-tree, what is the prefetching distance? Is it 1?

D4. As mentioned in the weak points, section 5 should be augmented accordingly.

D5. When is an MxTask deleted? I believe it is not mentioned in the paper.

D6. The prefetching distance is an important parameter for such system. The authors should mention its impact in details with proper experiments.

D7. How does the workload impact the results? It is unfair to judge a system's impact by running only one workload. The authors should multiple workloads and report their results. Also, in the current experiments, only lookups are issued after inserting. How will the system perform under a hybrid workload?

D8. The optimal granularity of MxTask is unclear. The impact of the granularity on the scheduling cost should be made clear (with proper experiments if possible)

D9. The CPU cost analysis should be presented in more details. It seems like MxTasking should have higher CPU cost. The tradeoff along with the penalty should be clearly mentioned.

==================
Typo:
Section 1: today’s -> todays
Section 2: earlyer -> earlier
Section 5.2: "CPU cycles spend" -> "CPU cycles spent"
Section 6: can not to share -> can not share
Those annotations hints -> Those annotations hint


14. Supplemental material. If the authors have provided supplemental material (data, code, etc.,), is the information likely to be sufficient to understand and to reproduce the experiments? Note that we do not expect actual reproducibility experiments, but rather a verification that the files are in fact there and are reasonable in scope and content.

The paper contains a lot of implementation details. As this is more like a vision paper for a new programming model it will be very important to release the code to the public.


15. Revision. If revision is required, list specific required revisions you seek from the authors. Please number each point.

D1, D2, D4, D7, D8

Reviewer #2

1. Overall Rating

Weak Accept


2. Relevant for PVLDB

Yes


3. Are there specific revisions that could raise your overall rating?

Yes


5. Summary of the paper (what is being proposed and in what context) and a brief justification of your overall recommendation. One solid paragraph.

The paper proposes a task-based parallel programming model called MxTasking as a building block for database systems. Unlike other task-based parallel models, an MxTask will be annotated with information on what data object will a task access and what synchronization is needed, to aid with scheduling and task placement. The main strength of the paper is its bold, visionary research approach to rely on annotations for light-weight tasks that shows promising preliminary results from a single data structure. However, for acceptance as a research paper, additional results and more technical details are necessary.


6. Three (or more) strong points about the paper. Please be precise and explicit; clearly explain the value and nature of the contribution.

S1. Visionary work that explores a different way to express parallel computation in data-centric programs through annotations over light-weight tasks.

S2. Nice integration of MxTasking scheduling with memory allocation, memory management, and memory accounting capabilities.

S3. Favorable performance comparison with TBB and pthreads for one common data structure (B-tree).


7. Three (or more) weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered a contribution; write it so that the authors can understand what is seen as negative.

W1. For the programmer using MxTasking, it seems that many difficulties associated with thread-based programming would morph into difficulties of annotating a task correctly. It is unclear if pushing the problem at this different abstraction level makes it any simpler to solve or reason about.

W2. The evaluation only considers a B-tree, which has low concurrency and good caching behavior (as all operations start at the root). The paper would be far stronger if it considered a broader set of operations and access patterns, to capture other aspects of concurrency in database systems.

W3. The writing needs to be more concise and offer more specifics. See detailed points.


8. Novelty. Please give a high novelty ranking to papers on new topics, opening new fields, or proposing truly new ideas; assign medium ratings to delta papers and papers on well-known topics but still with some valuable contribution.

Novel


9. Significance

The paper is going to start a new line of research and products


10. Technical Depth and Quality of Content

Solid work


11. Experiments

OK, but certain claims are not covered by the experiments


12. Presentation

Reasonable: improvements needed


13. Detailed Evaluation (Contribution, Pros/Cons, Errors); please number each point and please provide as constructive feedback as possible.

D1. The paper narrowly looks at individual lookup/insert operations in one data structure (a B-tree) that has limited concurrency and good caching. The case for MxTasking would be far stronger if the paper investigated other data structures (ideally of the high concurrency+poor cachability variant, such as a hash table) and expanded the scope to more complex data access patterns than a single operation to data structure (such as OLTP transactions with multiple accesses, or very simple OLAP query patterns such as an aggregation).

D2. Given how central annotations are to the MxTasking framework, I was expecting that the paper would precisely define what different annotation types are supported and also show how the requested information is conveyed to the runtime, perhaps through an example. This information is now hinted at from multiple locations in the text, but is never presented fully. A clearer distinction between what *is* supported and what *could be* supported would be very helpful.

D3. Although the notion of re-architecting the entire system stack around a task-based paradigm is intriguing, prior efforts in using a task-based paradigm have stumbled due the volume of legacy thread-based code that is in wide use today. I would have appreciated more information on how MxTasks can interface with existing programs/libraries that use a thread-based model for parallelization. In this reviewer's opinion, that would be more relevant to include instead of the short vision section.

D4. I could not find information on how MxTasks communicate with the "outside world", such as returning data to the user, writing to files or sending messages over the network. There are multiple aspects of this communication that are unclear.
(a) One question is the mechanism. Section 5 mentions callback functions on a successful completion of a task, but is this the primary mechanism?
(b) Many communication interfaces are stream-based and require serialization, such as writing to stdout from multiple tasks, or have isolation semantics that are ill-suited to the task model, such as writing in a file (presumably by issuing a sequence of open→write→close POSIX calls). How is this handled?
(c) The worker algorithm shown in Figure 2 assumes that tasks can be reset and rescheduled, which suggests that any non-revocable actions should take place after a task has executed successfully. If these actions are performed by the MxTasking runtime outside the context of a task, how can tasks that depend on the outcome of non-revocable, blocking actions be executed by MxTasking? (For example, when a credit card being charged successfully is a prerequisite for an order to be placed?)

D5. The underlying assumption here is that tasks are small units of work that a single CPU core would execute each in microseconds. There is no discussion on how longer operations that need to be executed across all CPU cores, such evaluating an aggregation query over all data, can be conveyed as MxTasks. Is the user responsible in splitting them into light-weight tasks and manage intermediate state, or can the framework do this automatically?

D6. MxTasks may need to access multiple objects, and then naively acquiring locks may result in a deadlock. How is this handled? Is the scheduler tasked with finding a deadlock-free execution schedule?

D7. Section 4.1 only describes the three mechanisms to synchronize tasks, but it should also describe the policy that MxTasking uses to pick which mechanism to use at any time.

D8. The objects that will be accessed by a task may not be known a priori if there are data-dependent accesses, and making such accesses certain may incur too much overhead. (Consider the B-tree code in Figure 3: A pointer lookup now becomes a task invocation!) It would be nice to directly address whether the user is expected limit a task to the smallest predictable unit of work, or if the framework supports different object granularities in ad-hoc hierarchies.

D9. Debugging incorrect annotations could be challenging. Can MxTask verify that the annotations the user passed are correct, and no other objects are inadvertently touched? Can MxTask inform that a certain data object in the annotation was not accessed by the code?

D10. Although the paper uses the practical problem of choosing the appropriate prefetching distance as a motivating example, it is unclear how MxTasking alleviates this exactly. The last paragraph of Section 3 describes that MxTasking prefetches data for task k+1 before task k executes. Given that tasks may have different durations and access profiles, the prefetch may still be too early or too late. While it is appreciated that MxTask can know about these accesses in advance whereas prior work cannot, the writing should explain how this additional information is used to choose the appropriate prefetch distance.

D11. The claim that MxTasking significantly eases the development effort is never supported with data. Although critically evaluating this question is beyond the scope of a data management paper, the paper could present some empirical data on the effort involved in writing and debugging the B-tree code in MxTasking versus the TBB-based and pthreads-based implementations.

D12. Another way to achieve higher concurrency is to use vectorization. Many papers have reported significant improvements from carefully using SIMD instructions for data processing. Can MxTasks also leverage vectorization, in addition to the different hardware that is mentioned in Section 6? Generally speaking, a weakness of task-based parallelism in prior work has been benefiting from batching, and it could be an important contribution if MxTasks can somehow accomplish this through its novel annotation interface.

D13. Database transactions support different isolation levels. Is this supported by MxTasking, or the user needs to implement lower isolation levels manually by carefully orchestrating tasks with different annotations?

D14. Minor typo: (page 4, col 2) "accessed for more often than leaf nodes" → far


14. Supplemental material. If the authors have provided supplemental material (data, code, etc.,), is the information likely to be sufficient to understand and to reproduce the experiments? Note that we do not expect actual reproducibility experiments, but rather a verification that the files are in fact there and are reasonable in scope and content.

No supplemental materials provided.


15. Revision. If revision is required, list specific required revisions you seek from the authors. Please number each point.

Please address detailed points above.

Reviewer #3

1. Overall Rating

Reject


2. Relevant for PVLDB

Yes


3. Are there specific revisions that could raise your overall rating?

No


5. Summary of the paper (what is being proposed and in what context) and a brief justification of your overall recommendation. One solid paragraph.

The paper presents MxTasks, a task-based framework that can assist in the development of parallel data structures. MxTasks provide a task abstraction that can be annotated with data accesses patterns and synchronization requirements. Based on these annotations, a scheduling system is proposed that can automatically pick the right synchronization and prefetching mechanisms. The main reason for the rejection is that the paper does not cover a huge amount of relevant techniques, and fails to explain clearly why MxTasks are any different from these other approaches.


6. Three (or more) strong points about the paper. Please be precise and explicit; clearly explain the value and nature of the contribution

.The paper is very well written, clearly motivates the problem of designing parallel algorithms for data processing on modern hardware and makes the case for a DBMS-OS codesign for heterogeneous hardware.


7. Three (or more) weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered a contribution; write it so that the authors can understand what is seen as negative.

- The paper misses a huge body of relevant literature
- The paper does not clarify how MxTasks make it any easier to develop parallel applications as a code-to-code comparison with a competing framework is absent

 

8. Novelty. Please give a high novelty ranking to papers on new topics, opening new fields, or proposing truly new ideas; assign medium ratings to delta papers and papers on well-known topics but still with some valuable contribution.

Ideas are not new (say why in Q7 or 13)


9. Significance

No impact


10. Technical Depth and Quality of Content

Syntactically complete but with limited contribution


11. Experiments

OK, but certain claims are not covered by the experiments


12. Presentation

Excellent: careful, logical, elegant, easy to understand


13. Detailed Evaluation (Contribution, Pros/Cons, Errors); please number each point and please provide as constructive feedback as possible.

The task of designing scalable primitives for synchronization in multicore, in-memory databases has been the topic of extensive research the past few years. MxTasks attempts to be a general-purpose solution rather than a point design. Unfortunately, the techniques proposed in the paper can be directly traced back to well-established methods that are not mentioned at all in the paper even though the are widely used for building parallel software. For instance,

Task: the core notion of a task in this paper is directly related to user-level threading or Fibers. There is no mention of Fibers in the paper even though they existed for over two decades, database systems have been designed with them (RethinkDB, Fiber-based Architecture for NFV Cloud Databases,
Analyzing the Impact of System Architecture on the Scalability of OLTP Engines to name a few), and OSes like Windows offer native support for Fibers.

Run-to-completion: This is straight "cooperative multitasking" as supported by Fibers.

Embedded task scheduling: Intel TBB is mentioned in the paper. But the only discussion is about control flow. There is no mention of flow graph abstraction, or data flow programming in general. These abstractions allow a user to declare nodes in a flow graph and automatically partition the data and parallelize an entire application without any manual code. It is not clear if embedded task scheduling offers any additional benefit over this.

Prefetching, sequencing by scheduling: The idea of assigning threads to a common pool based on data access is logical partitioning proposed in DORA and used in several other systems. The idea of prefetching and scheduling tasks after data is ready is scheduler-assisted prefetching. The paper mentions that each task can be annotated with a data item. In fact, data flow abstraction encourages users to think about core computation as nodes, and describe how data flows through the nodes using buffers that can be used to automatically infer data dependencies and identify possible parallelization strategies without having to explicitly deal with tasks, threads, or data partitioning. Thus, it is more common in flow-graph or morsel-driven execution to specify all data upfront and have the runtime pick the right partitioning of data, assignment of data to tasks, and tasks to threads/cores. Annotations is in direct contrast to this, as it requires programmers to tag tasks with data as described in this paper. It is not clear as to why this would provide any benefit over data flow programming.

Latches & optimistic versioning: The paper picks a few latching techniques and uses them in MxTasks. Apart from automatically picking the type of latches, latching itself seems to be orthogonal to MxTasks. The same applies to optimistic versioning -- Mxtasks dont simplify the implementation of versioning, rather it uses versioning for optimistic access to datastructures. Prior work has already shown the scalability of optimistic data structures and some work has also shown that latching even in 2PL need not be a bottleneck if properly designed. In the presence of this, it is unclear why a system would choose a suboptimal latch, or how mxtasks improves on this. Avoiding latching by serializing access to data via scheduling is not necessarily a good strategy, as a well-designed scalable latch will provide much more parallelism than serial execution.

MxTasks and heterogeneous hardware: Programming across heterogeneous processors is a very difficult and open problem as evidenced by the adoption of CUDA and opencl. In fact, CPUs and GPUs have fundamentally different programming models, with GPUs favoring data parallelism over task parallelism. Synchronization in GPUs is also quite different as it very much depends on the granularity of synchronization. Given all this, apart from acting as a scheduling layer, which is already done by both Intel TBB and CUDA flow graphs, it is unclear how MxTasks simplifies the task of heterogeneous programming.

- Database systems have traditionally favored hiding parallelism, in either operators like the volcano model, or using runtimes like TBB with morsel-driven execution. In both these case, single-threaded code, with appropriate synchronization constructs for accessing shared data, is parallelized without the programmer or the database developer explicitly annotating tasks. For GPGPU or FPGA-based heterogeneous programming, even current frameworks like CUDA explicitly state that manual fine-tuning is necessary to achieve maximum performance given that layers of memory types and different granularities of parallelism and synchronization on modern GPUs. Given this, the authors should, in future submissions, provide examples, for instance side-by-side code example, that shows why annotations can simplify programming effort beyond data annotations and prefetching examples given in the paper. Another direction might be to take an existing task-based framework like Intel TBB, extend it with annotations, and show the benefit.

 

14. Supplemental material. If the authors have provided supplemental material (data, code, etc.,), is the information likely to be sufficient to understand and to reproduce the experiments? Note that we do not expect actual reproducibility experiments, but rather a verification that the files are in fact there and are reasonable in scope and content.

N/A

 

Related information