Reviews
Review 1
SUBMISSION: 5
TITLE: OCTO+: Optimized Checkpointing of B+Trees for Non-Volatile Main Memory Wear-Leveling
AUTHORS: Christian Hakert, Roland Kühn, Kuan-Hsun Chen, Jian-Jia Chen and Jens Teubner
Overall evaluation
SCORE: 0 (borderline paper)
TEXT:
In this paper, the authors design and implement an aging-aware wear-leveling B+-Tree to reduce bit-flips in NVM and extend the device's lifetime. The design minimizes the modification of the mapping table by storing the information of modification in B+-Tree nodes. The motivation and object of this paper are safe and sound. The wear-leveling algorithm performs intra- and inter-block to balance the age of NVM blocks in the amount of software memory. Intra block wear-leveling utilizes modification mask (NMM) of tree node and 8-bit bitmap for every NVM block to identify not uniform aged blocks and place the modified tree nodes on young parts of the NVM. Inter block wear-leveling exchanges the mapped B+-Tree node to level uneven aging patterns within NVM blocks if the difference of counter value between the youngest and oldest block exceeds a threshold. In evaluation, the authors compare the proposal with aging aware (AA), random modification of the mapping table (RANDOM), and a!
ring based remapping scheme in three different data sets: monotonic data, random data, and YCSB workload. The node size of the tree is 1024 Bytes. The evaluation conducts 20000 and 50000 operations with different insert/update workloads. The result shows that the proposal extends memory lifetime up to 3x improvement than inter block wear-leveling only. The paper is easy to understand.
However, the design of this paper misses several important points. The authors need to clarify following issues.
1. This paper proposes a wear-leveling B+-Tree, which reduces the extra writes for the remapping table. However, the proposal does not consider carefully organizing the value corresponding to each key, which is the critical data that must be persisted in NVM. In fact, the size of a value is much larger than B+-tree node but, generally, less frequently modified. They must be taken into serious consideration as they occupy a lot of NVM space.
2. The remapping table is frequently modified. The current design without covering it makes the design uncomprehensive and unreliable, thereby leaving an incomplete and defective solution.
3. Leaf nodes of B+-Tree contain critical data required to be persisted in NVM, and internal nodes of the B+-Tree can be reconstructed from the leaf node within the recovery. This paper considers mapping all tree nodes from DRAM to NVM, which is sub-optimal for the low cell endurance NVM.
4. In the evaluation part, the authors choose a node size of 1024 Bytes for memory lifetime evaluation. It is advisable to investigate the impact of various node sizes because NMM is fixed in 8 bits to indicate the modification part, which is fine-grained and might cause an influence for larger node sizes.
5. It is better to describe the recovery procedure of this algorithm with respect to checkpoint.
Review 2
SUBMISSION: 5
TITLE: OCTO+: Optimized Checkpointing of B+Trees for Non-Volatile Main Memory Wear-Leveling
AUTHORS: Christian Hakert, Roland Kühn, Kuan-Hsun Chen, Jian-Jia Chen and Jens Teubner
Overall evaluation
SCORE: 1 (weak accept)
TEXT:
This paper proposes an application cooperative wear-leveling scheme for B+ trees, which realizes an interaction of the application and the wear-leveling. First, a modification implementation of a general B+ tree is provided, which tracks the modification information of tree nodes during execution. It then further provides an application cooperative wear-leveling scheme, include intra-block wear-leveling and inter-block wear-leveling. The scheme uses the collected information to improve the lifetime of the NVM blocks by modifying the mapping of regularly created checkpoints. Finally, the memory wear-out on
cell granularity is evaluated, and the potential for further generic wear-level is estimated.
My concerns are as follows:
1. In the inter-block wear leveling scheme, each NVM block has eight 8-bit counters. These counters should be checkpointed when the power is off. Where should these counters be checkpointed? If it is checkpointed to NVM, is there any consideration of wear-out of counters on NVM?
2. In order to avoid thrashing in the mapping, the remapped NVM block will not be removed temporarily in the subsequent checkpoints. In the experiment, there is no detailed parameter indicating the number of checkpoints.
Review 3
SUBMISSION: 5
TITLE: OCTO+: Optimized Checkpointing of B+Trees for Non-Volatile Main Memory Wear-Leveling
AUTHORS: Christian Hakert, Roland Kühn, Kuan-Hsun Chen, Jian-Jia Chen and Jens Teubner
Overall evaluation
SCORE: 0 (borderline paper)
TEXT:
Summary:
This paper proposes a new wear-leveling scheme, OCTO+ for B+ tree. Since DRAM has better endurance than NVM, DRAM maintains the working data of B+ tree and regularly performs the checkpoint to persist the modified data into NVM. However, persisting data will reduce the durability of NVM. OCTO+ uses bitmask to collect the write information between this and the next checkpoints and only persists the modified data into NVM during the next checkpoint. OCTO+ also uses eight 8-bit counters to indicate the age of the NVM block. According to the aging information, OCTO+ performs the intra and inter block wear-leveling.
Strength:
Wear-leveling in NVM is an important problem.
Weakness:
1. Lacks of comparison with existing wear-leveling schemes.
2. The design figure is difficult to understand.
Comments:
1. The wear-leveling in NVM has been widely researched for decades. However, this paper doesn’t introduce the related works. The experiments also lack comparisons with typical wear-leveling schemes, such as Segment Swapping@ISCA09 and region-based Start-Gap@MICRO09.
2. As mentioned in Section IV, there are some hardware and software mechanisms to gather the information of modifications of B+ tree in DRAM. Now that we have information collection schemes, why do we need the alternative scheme proposed in OCTO+? Compared with existing software and hardware mechanisms, will the alternative scheme in OCTO+ have lower overheads?
3. As shown in Fig 1, the NMM occupies some space of the header of the node. Is the header still fully functional in this case? Do we need to compact the header to make room for NMM?
4. It is difficult to understand Fig 2 since there are no descriptions of the figure. For example, what are operations 1 and 2 in Fig 2?
5. To perform the inter-block wear-leveling, OCTO+ needs to search for the youngest and oldest blocks and exchange them. Since the number of blocks in NVM is large, how to find the youngest and oldest blocks with low overheads?
Review 4
SUBMISSION: 5
TITLE: OCTO+: Optimized Checkpointing of B+Trees for Non-Volatile Main Memory Wear-Leveling
AUTHORS: Christian Hakert, Roland Kühn, Kuan-Hsun Chen, Jian-Jia Chen and Jens Teubner
Overall evaluation
SCORE: 0 (borderline paper)
TEXT:
The authors present a strategy to checkpoint B+ trees in hybrid DRAM / NVM main memory systems.
The paper is well written and easy to follow, and the authors proposal to perform wear-levelling shows good results.
My main concern is on the specificity of the solution. Most wear-levelling techniques, as the authors mention, apply system-wise but this proposal is for a very specific scenario (B+-trees), requiring application changes when inserting, updating or deleting nodes.
Some other minor remarks:
- I would need more explanations for the intra block war-levelling strategy: the unmapped tree nodes are mapped traversing the tree with the inverse modification mask. But it could happen that several NVM blocks lead to the same 8-bit signature or that the inverse 8-bit chain from a tree node does not reach to any NVM block. How are those situations handle? Fig 2 would need some more details to show the whole process
- How much is the threshold presented in V-B 2) Inter block wear-leveling. How is it derived? IT depends on the application?
- The authors assume that the whole tree fits both in DRAM and NVM. Is it a valid assumption in the general case?
- What's the overhead of the proposal? Checkpointing every 50 operations in long write latency NVM may impact overall performance. An estimation, compared to other checkpointing proposals, would be welcomed
- The proposed wear-levelling implies data migration from some blocks to others when the algorithm is triggered. A migration implies updating the whole block and not only the word(s) modified by the user, so the number of write operations to the NVM may increase. Could it be a significant increase, for example, from the energy consumption point of view?