Zum Inhalt
Fakultät für Informatik

Reviewer #1

1. Overall Rating

Accept

2. Reviews

The paper discusses the possibilities and challenges of processing inside SSDs. This idea goes along with the trend observed in the last years of smart storage, smart memory, near data computation, etc. that is being extensively explored and already used in a variety of systems and deployments. What is unique about this paper is the focus on NAND devices and the problems associated with large scale, mainly the potential lack of reliability through bit-flips and other errors leading to data corruption. I work in this area and I still learned a lot from this paper, which presents a very neat and complete list of potential issues in such devices, their limitations, and the reasons why such situations happen. That gives the paper an additional value as survey of the key problems in the area even if its focus is on the general idea of processing in storage and how to approach it in a real setting. The ideas in the paper are very original. Of particular value is the direct connection to databases and data processing algorithms for a problem hardly discussed (typically it is assumed it is covered through redundancy and encoding) in the context of databases. From that point of view, it is a fantastic contribution the the IDEAS track as it covers novel technology, raises awareness of an important problem, and established a strong link to databases and data management though the preliminary study of data structures tolerant to bit-flips. I expect the latter to become a very interesting line of work that many people will follow, hence my strong recommendation to accept this paper. A reference that could be of interest to the authors is a recent monograph by Bonnet and Lerner on computing on SSDs. It would also be interesting comparing the approach in the paper with the earlier work on pushing queries down to storage enhanced with an FPGA. Papers like IBEX (VLDB'14) or work extending this idea by Samsung in later VLDBs (e.g., VLDB'16) can provide an interesting contrast in terms of the capabilities and type of processing being considered.

Reviewer #2

1. Overall Rating

Accept

2. Reviews

This paper presents vision on a research direction that aims to deploy Processing-in-NAND (PiN) technologies to improve the overall database performance. The focus is placed on leveraging "off-the-shelf" (OTS) PiN. The authors made a strong case for the prospect of OTS PiN both from the hardware and software perspectives. Some experimental results are presented to support the vision arguments. The paper is a pleasure to read. It appears to be a very good fit of this special track.

Reviewer #3

1. Overall Rating

Accept

2. Reviews

This paper fits extremely well with the theme of DEFT. It presents emerging storage technology and discusses its impact on database systems. This reviewer is not a hardware expert and had to accept the hardware details provided in the paper, but consider the strong set of references, they seem well substantiated. The discussion of impact on databases in very good, albeit a bit dystopian. The idea that we have to now start dealing with untrustworthy data storage is not appealing but an apparent reality. The paper discusses those aspects of database systems that can take advantage of faster but unreliable storage (e.g., indexing) and presents an extensive analysis of Bloom filters and binary sketches for similarity search. In the main discussion, I would have preferred to see more numbers rather than vague comparisons (more/increased/etc) because that would help address the deeper question here which is whether the added performance or lower cost of unreliable storage is sufficient to justify the extra computational work that has to be done at the application level. Also of interest, but not covered, is the business question of whether we will even get that choice. Will the hardware industry simply impose its interests in hitting its performance/cost numbers even if the result is a net negative for database systems. Much of the paper made me feel, as a database person, that I was reading about a grim future of unreliable storage. If the benefits are substantial, it would be nice to have had that presented and numerically. 

There are interesting new concepts here like "fail-slow" systems, and designing algorithms to include operations that can be in memory itself (not on the processor) 

This is, in my view, a strong accept because the DB community needs to be thinking about these developments. If the authors can add more hard numeric data, that would be appreciated. If space is an issue, it would be okay to spend a bit less space on the analysis of specific data structures in favor of a more general analysis of the performance of the underlying new technology. 

One nit: page 6, col 2: "workloads may want" -- please reword this since workloads are "things" and they do not have desires. Maybe "workloads would benefit from"