Sprungmarken

Servicenavigation

Hauptnavigation

Sie sind hier:

Hauptinhalt

eXrQuy: Order Indifference in XQuery — VLDB 2006 Reviews

Reviews for paper eXrQuy: Order Indifference in XQuery, submitted to VLDB 2006.

Overall rating: reject

Reviewer 1

Reviewer Confidence

Medium

Is the paper relevant to the VLDB Conference?

Yes

Is the paper technically correct?

Probably [Not fully verified]

Does this work overlap significantly with earlier publications?

No

Assessment of Importance

Weak Reject

Presentation

Accept

Overall Rating

Weak Reject

Summary of main contribution (one or two paragraphs)

First, the semantics of order requirements and the impact of various combinations of order-relaxing constructs in XQuery are very clearly presented. Second, the details of order handling in a particular XQuery implementation are summarized, and transformations introduced to a) infer order indifference in certain cases and b) take advantage of order indifference by generating different query plans.

Justification for Importance Rating

1) The main technique "icols" is standard in all query optimizers and diserves less attention than it gets.
2) The big performance improvements come from a particularity of the system in question: order is always added to intermediate results by a new, expensive, partition and sort operator, the removal of which helps a lot. The alternative, of having every primitive return document order directly, seems obvious, but may lead to far less improvement.
3) The correctness of certain transformations needs more formal support - For example in 4.2, (7), the authors do not comment under what condition the optimization can be applied, but clearly a non-intersection of XPaths must be inferred. Other comments below.

Body of review discussing paper strengths and weaknesses (2-5 paragraphs) [Describe and assess major aspects of the paper.]

Section 2 is great, as far as it goes. But at the next-to-last paragraph, the authors bail out - saying there is not a semantic definition of order indifference in FLOWOR blocks, but not seeking to address, or even partially address, this issue. Consider the interesting criticism of "unordered pushdown" at the end of sec 2. Does sec 3 address this?
Section 3 and 4 turn the generic setting into the specific setting of plan generation in Pathfinder. While this is interesting and well presented, the most interesting part is the overall design which is discussed elsewhere. The particular approach to handling order - the introduction of the rho or # operators, does not seem fundamental. The primary optimization technique is to a) get rid of unused columns and b) get rid of operators that produce only these columns. There is practical impact here, but it is not clear what portion of it is specific to the pathfinder implementation and what is more universal. Section 4.2 is concerning. The statement of rewriting rules is overly informal, and stated in terms of the implementation rather than the semantics. Things like "fn:reverse()" has no effect in the context of order indifference seems true, but should probably be a theorem w.r.t. formal semantics. For example, it is not clear that in the formal semantics and in an unordered context that, for a sequence s, "s = fn:reverse(s)" can be reduced to "s = s" or true. Similarly, the applicability of "|" -> "," needs to be formalized.
The experiments seem fine and the experimental section is well written.

Detailed comments for the author.

1) "To exemplify" should be universally changed to "For example".
2) page 6, "lies in wait in [the] scopes of order indifference [related optimizations]." or otherwise fix
3) Section 4.1 should be replaced with small paragraph stating the definition of icols.

Reviewer 2

Reviewer Confidence

High

Is the paper relevant to the VLDB Conference?

Yes

Is the paper technically correct?

Yes

Does this work overlap significantly with earlier publications?

No

Assessment of Importance

Strong Accept

Presentation

Accept

Overall Rating

Accept

Summary of main contribution (one or two paragraphs)

The paper focuses on problems related to the order-based semantics of XQuery, which imposes serious challenges to the application of well know optimizations s.a. join introduction etc. The authors operate within the boundaries of an XQuery engine with a relational back-end. Although this limits the verbatim applicability of their techniques, the reviewer believes that little effort is needed to port the idiom to other systems.
The solution proposed is based on the identification of order indifference of (sub)expression, a property that is detected at the algebraic level. A combination of data flow analysis and algebraic rewrites clears the way for the relational back-end to pick up joins or apply other important optimizations. Experiments show considerable speedups for some of the XMark queries.

Justification for Importance Rating

So far, XQuery optimization research has been focused around XPath optimization. Although XPath optimality is crucial to XQuery performance, there is more to XQuery than XPath alone. Being able to pick up efficient value joins and other well-know optimizations is -for some queries at least- equally important. This paper widens the focus of XQuery optimization research and identifies new challenges.

Body of review discussing paper strengths and weaknesses (2-5 paragraphs) [Describe and assess major aspects of the paper.]

Paper Strengths

  • The paper rightfully identifies an important optimization problem in XQuery
  • The paper presents a systematic algebraic solution for the problem
  • The paper is timely and well-written

Weaknesses

  • One important question remains unanswered: no idea is given of the coverage of the presented approach. I.e., is the 'order indifference' property picked up for every order indifferent expression? If not, are there any important cases you may miss? (see also detailed comments below)
  • The experimental section is too limited. The XMark benchmark suite is not really suitable to serve as a stand-alone validation of the approach. Also: are the results for Q6 and Q7 not an artifact of poor performance of the staircase join algorithm, which in context-sensitive expressions has to resort to item per item applications of the algorithm instead of bulk?
  • Little attention is payed to duplicate handling (which has an obvious link with sorting)
Detailed comments for the author.

11: Detailed comments for the author.
Section 1:
I miss one case of order indifference in the summary on page two. This is the case where an XPath location step reorders nodes generated by an arbitrary XPath location step, e.g.,
let $x := for $y in $input order by $y/text()
return $x/.
After reading the paper, I was left wondering how the presented approach deals with this category of expressions. It also left me wondering if there are other important cases that may be missed (this ties back to the comments above).
Section 2:
Typo in expression (5), redundant closing brace at the end
Section 5:

  • Please clarify the results for Q19 and Q20
  • It is worth pointing out that Galax, although lacking a systematic approach for picking up order indifference, properly deals with Q6 and Q7 by simple unused variable removal - hence my comment above.
  • Taking apart one query to motivate the direction of XQuery research seems a little harsh to me. There is an infinite number of queries over equally many other data sources that may not fit this profile. I would like to see a better motivation.

Section 6:
I have a problem with the statement that XPath processing is well understood by now. It is true that XPath processing *in isolation* has received a lot attention, but many problems related to XPath optimization are situated in the interactions with other XQuery constructs. Especially FLWORS (maps) can trick some of the supposedly optimal evaluation strategies into poor performance. This is exactly what was going on in Q6 and Q7 in the experiment, and many other examples can be devised for which the presented optimization framework will not provide the best solution.
Otherwise, this is a very good paper.

Reviewer 3

Reviewer Confidence

High

Is the paper relevant to the VLDB Conference?

Yes

Is the paper technically correct?

Probably [Not fully verified]

Does this work overlap significantly with earlier publications?

Yes

Assessment of Importance

Reject

Presentation

Reject

Overall Rating

Reject

Summary of main contribution (one or two paragraphs)

The authors study the unordered mode in xquery. They discuss how this is important an how this can potentially lead to performance benefits. They present some nice rules on how unordered mode can eliminate some operations in their xquery implementation. They map things to a relational model, hence unordered operation is more natural for a system like that.
They also show experimentally that there is a performance difference when using unordered mode. That was expected in a bulk-processing system that maps to relational operators. Their discussion adds information to this issue with XQuery and order.
Because the authors missed related work, it is not very clear what are their contributions. It feels like they describe how they did it in their system. Why is that important, interesting, challenging and different from existing related work is not clear to me.

Justification for Importance Rating

The authors missed related work on the subject. Some of the ideas they discuss have already been presented. It is not clear how different they are and whether their contributions justify a VLDB publication. Additionally the paper is very hard to read for the average VLDB attendee.

Body of review discussing paper strengths and weaknesses (2-5 paragraphs) [Describe and assess major aspects of the paper.]

Strengths :
a)The authors discuss an important problem. Unordered mode in xquery processing. XML and xquery require order and there has not been enough consideration on how to evaluate it efficiently. The authors suggest some techniques to consider unordered mode and produce performance speedups this way.
b)The solution they suggest seem nice and could be incorporated into systems like their own (i.e. mapping to relational). They also show experimentally that performance is enhanced when taking advantage of opportunities to optimize order.
Weaknesses :
a) Although the authors go into the proper direction they do not spend enough time studying/discussing the problem. XML and XQuery order is a very deep problem. They just mention unordered mode. It would be much more interesting if they address the entire order problem and how value order, document order (even multiple document orders) and combination of all affect XML processing.
b) The authors claim there is no related work and they justify that by stating that no open source xquery compiler has anything in their code. Although it must have taken them a lot of time to look into so many different implementations, when submitting a paper one should pay some attention into published work as well, at the very least the VLDB conference itself from recent years.
Work in this area and ideas like that have already been presented. It is not clear to me what are the contributions of the authors and how do they differentiate from the existing related work.
For example, the authors do not even acknowledge the following papers:
-J. Hidders and P. Michiels. Avoiding unnecessary ordering operations in XPath. DBPL 2003
-Stelios Paparizos and H.V. Jagadish. Pattern tree algebras: sets or sequences? VLDB 2005
-D. Simmen, E. Shekita, and T. Malkemus. Fundamental techniques for order optimization. SIGMOD 1996.
c) The writeup of the paper is not very good. The paper is very hard to follow for the average VLDB reader. It assumes the reader already knows a lot about the system the authors have built and their way of attacking the XQuery problem. They use a lot of references to papers for their own system for that purpose. Also the sections are not well separated. The paper reads like one big blur. The figures are fairly small and the examples used are only of very simplistic queries.
Their system looks like a nice system that performs well in general, but that is the main focus of one of their referenced paper, not this submission.

Detailed comments for the author.

I would summarize my previous part of the review for the authors in the following way. It is nice that they focus on the order problem and unordered mode in xquery. This is clearly a problem that has not been studied enough and there are many optimization opportunities. Yet, one must do a thorough background check first and demonstrate how they differentiate from existing related work. Also the presentation of the paper should not assume the reader is an expert of the system the authors work in. They need to re-write the paper and make it easier to read, use more elaborate examples and discuss how they deal with order in general.

Related Information



Nebeninhalt

Kontakt

Prof. Dr. Jens Teubner
Tel.: 0231 755-6481