Sprungmarken

Servicenavigation

Hauptnavigation

Sie sind hier:

Hauptinhalt

Skeleton Automata for FPGAs: Reconfiguring without Reconstructing — SIGMOD 2011 Reviews

Reviews for paper FPGAs for Dynamic (XML) Query Workloads, submitted to SIGMOD 2011.

Overall rating: reject

(This work was later published as “Skeleton Automata for FPGAs: Reconfiguring without Reconstructing” at SIGMOD 2012.)

Reviewer 1

Summary of the paper's main contributions and impact

The paper proposes a new FPGAs-based engine that processes stream-based XML data (simply, XML stream data) for dynamic XML query workloads. The proposed system stores query information, tag predicates, into segment matchers and update the information whenever queries are changed. Hence, the system can remove the overhead of query compilation due to change of queries. In addition, the paper proposes three techniques for enhancing query processing performance: the technique for processing multiple paths, pipelining and BRAM sharing. Last, through the experiments, the paper shows that the proposed system is scalable and outperforms the method using software in terms of query processing time and memory utilization.

Strong points of the paper (please number them S1,S2,..)

S1. The paper proposes a new FPGAs-based stream processing engine with high speed and high resource utilization for dynamic XML query workload.
S2. The paper proposes three techniques for enhancing query processing performance of proposed engine.
S3. The paper provides extensive experimental results.

Weak points of the paper (please number them W1,W2,...)

W1. The assumption of the paper seems to be too strict. The paper assumes that the clock of the proposed system is fully synchronized with the input rate of XML data stream.
W2. Performance comparison between the proposed engine and other existing engines using hardware has not been done.

Innovation

Medium

Quality of Presentation

Medium

Quality of Experiments, Repeatability -- if applicable. (Do the experiments support the claims made in the paper and match the motivating examples?)

Satisfactory

Detailed comments (please number each point)

The authors propose a new FPGAs-based stream processing engine for high-volume XML query processing. This proposed system supports the dynamic query workloads that are not supported by existing works. Moreover, the authors propose three techniques for enhancing query processing performance and perform extensive experiments.
On the other hand, I see some problems in this paper as well.
1)The paper assumes that the clock of the proposed system is fully synchronized with the input rate of XML data stream. It does not seem to be practical. The paper fails to provide a good justification for the assumption.
2)The paper does not show experimental results for comparing performance among the proposed engine, engines using micro programs, and other engines using FPGAs that do not store queries into their logic circuits. Thus, I am not sure whether the proposed method also outperforms other systems based on hardware as it does the ones based on software.

Overall Recommendation

Weak Accept

Reviewer 2

Summary of the paper's main contributions and impact

The paper proposes an FPGA-based "soft automaton" design which factors out nearly all FPGA reconfiguration costs into a generic XML projection framework. As a result, the FPGA can be "programmed" once (= downloading the generic framework's circuit bitstream) and reconfigured in a few microseconds (= downloading requirements of specific queries) to handle most downward XML projections. This approach allows the FPGA to accelerate dynamic XPath query workloads where previous designs' long reconfiguration times (hours) limited them to static workloads.

Strong points of the paper (please number them S1,S2,..)

S1: The approach allows a large number of frequently changing queries to share a single XML stream sent through one FPGA.
S2: The design combines low- and high-level design constraints in an elegant way (e.g., the interaction of pipelining and self axis matching).
S3: The experimental results provide strong evidence for the effectiveness of the approach.

Weak points of the paper (please number them W1,W2,...)

W1: The discussion of related work is rather cursory. It mentions other approaches but does not always explain how that work relates to the current approach (especially in the cases involving different algorithms -- see detailed comments).

Innovation

High

Quality of Presentation

High

Quality of Experiments, Repeatability -- if applicable. (Do the experiments support the claims made in the paper and match the motivating examples?)

Satisfactory

Detailed comments (please number each point)

D1: In section 4.3.1 the phrase "node test followed (!) by an XPath navigation axis" was clearly meant to convey some significant observation which I did not grasp.
D2: Given that each query path must be contiguous, how does the configuration logic handle fragmentation of the segment matcher chain? Is there some way to migrate segment matchers while preserving their state? Would it be possible to duplicate an inconvenient chain, then remove the unwanted copy once the new one outputs its first match?
D3: In general, how is a query end handled? Does it suffice to hold the matcher(s) in the FPGA until the software side confirms that it has no further need of them (because the query has already stopped)?
D4: Given the single-ported nature of the BRAM, does sharing a BRAM force each matcher to operate on only every other cycle? Or is there some way to store the data so that two cycles worth can be stored in each word?
D5: Section 6.2 suggests that 2-way BRAM sharing is best, but the results in section 7.1 discuss only non-shared and 3-way sharing. Judging from figure 15, the latter seems correct.
D6: The structure of footnote 7 is a bit awkward and could probably be simplified: "Clock signals in FPGA hardware are generated by a phase-locked loop, restricting allowable frequencies to discrete values (namely ...)"
DD7: Would it be possible to export the cooked XML to the host in order to simplify the software XML parser it runs? Right now the XML gets parsed twice.
D8: Is 150MHz really the slowest native clock frequency? If so, the 150MHz reported in figure 15 is worrisome because the next slower frequency would be 90MHz (180MHz divided down by a factor of 2).
D9: I would mention far earlier in the paper how the proposed setup can saturate a 1GB ethernet link with headroom to spare. Not complex to say, and nice to know.
D10: Section 3.1 mentions that the proposed technique is amenable to ASIC designs, while section 7.2 mentions that the FPGA-based version would be the bottleneck with a disk or 10G ethernet connection. Would the ASIC design improve clock speeds? I suspect it would, and quite significantly. Also, what about the burned-in chips some FPGA vendors offer?
D11: Nitpicks: "mapped particularly efficient[ly]" (pp 4); "... the client will not only benefit from reduced memory... [but also what?]" (section 7.2); "can be [s/ran/run/] fully in hardware" (section 8); "[s/Other than/unlike/] any existing FPGA solution" (section 9).
D12: The text mentions that reconfiguring the already-programmed FPGA takes only a few microseconds, but how long does it take the software side to figure out what configuration commands to send? I wouldn't guess it would be inordinately expensive, but some numbers -- even back-of-the- envelope -- would be helpful for backing up the intuition.
D13: It wasn't immediately clear that 'Y' was a substitute name for a hardware parser generator engine (what with software tools like 'yacc' working with .y files and all). Perhaps a different letter of the alphabet would work better, especially if it came with a footnote like the one used for 'X'.

Summary of Discussion and Author Feedback (to be filled in by moderator of discussion only after discussion concluded)

There was an extended discussion regarding this paper. While the hardware aspect is sound, there were significant concerns about the XML part, namely:
1. The sub-language considered in the paper is very simple, only downward navigation, without any branch predicates. This sub-language is too simple and may not be useful, especially when the data to be dealt with is stream, as the design does not mention anything about storing/keeping any information about internal nodes or sub-trees rooted at internal nodes in the navigation. Hence, not predicates can be applied later.
2. There is no comparison to CPU-based approach, hence, we even don't know how good the performance is for the simply linear queries.

Overall Recommendation

Weak Accept

Reviewer 3

Summary of the paper's main contributions and impact

This paper presents a hardware implementation of XML projection on an FPGA chip. The main feature of this implementation is that it can process dynamic quey workloads without recompilation. It achieves this feature by storing query workloads in a configuration file, thus changing query workloads is done through reconfiguration without changing the physical layout of the circuit.

Strong points of the paper (please number them S1,S2,..)

S1. The paper presents a full implementation of FPGA-based XML projection. This implementation can be used as a component to speed up the computation in XML stream processing.
S2. The paper proposes to avoid recompilation for dynamic query workloads by storing queries in configuration files and reconfiguring system upon changing workloads. This approach addresses a main drawback of FPGA-based applications and seems applicable to a wider range of applications than XML projection.
S3. The paper is well presented with sufficient details on the hardware background, the implementation details, and XML projection.

Weak points of the paper (please number them W1,W2,...)

W1. The highlight of the paper is the reconfiguration as opposed to recompliation of the system for dynamic query workloads. In this context, I wonder whether XML projection is the best application scenario to present the proposed technique. Specifically, as the work is essentially on XML-filtering NFAs for path queries, it seems more intuitive to directly deal with the dynamic query workload problem for query evaluation as opposed to for XML projection.

Innovation

Medium

Quality of Presentation

High

Quality of Experiments, Repeatability -- if applicable. (Do the experiments support the claims made in the paper and match the motivating examples?)

Satisfactory

Detailed comments (please number each point)

D1. In XML Semantics of Section 6.1, the paper comments that a side effect of pipelining is to allow the support of XPath self queries. How would a system without pipelining implement the self queries then? From the author feedback, I learned how self queries are implemented without pipelining - through conjuctive conditions.

Is the paper the appropriate length? If it is too long, please indicate what should be removed. If you have requested additional material in your comments, please indicate what could be removed (if necessary) to accommodate our length restrictions.

Yes

Summary of Discussion and Author Feedback (to be filled in by moderator of discussion only after discussion concluded)

None

Overall Recommendation

Weak Accept

Related Information



Nebeninhalt

Kontakt

Prof. Dr. Jens Teubner
Tel.: 0231 755-6481