Reviews for paper An Inflationary Fixed Point Operator in XQuery, submitted to ICDE 2008.

**Overall rating:** accept as short paper

This paper was later re-submitted under the title Recursion in XQuery: Put Your Distributivity Safety Belt On and accepted as a full paper at EDBT 2008.

Strong accept

Novel

Excellent

YES

HIGH

An important contribution to the design of XQuery. The paper proposes an extension of XQuery with controlled recursion in the form of inflationary fixed point (IFP). IFP is already known from the relational setting to trad expressivity of recursive function definitions for evaluation efficiency. The authors make a strong case that IFP is sufficiently expressive to capture an important class of desirable unser-defined functions. A sufficient condition for efficient IFP evaluation is identified (articulating it for XML querying is a contribution in itself, since in the relational case it holds implicitly).

IFP is an important primitive which extends the expressive power of XQuery in a controlled way, preserving efficient evaluation properties. In hindsight, the idea of adding IFP to XQuery is natural, and it is surprising that it has not been put forward before, and in particular that it was overlooked by the standard in favor of unrestricted recursion.

The identification of distributivity as the condition for ensuring that naive evaluation can be replaced with the more efficient Delta algorithm, and the sufficient conditions for ensuring distributivity. The elegant and seamless exploitation in XML querying of classical work on relational and deductive databases.

The comprehensive experimental evaluation gives encouraging results.

none

It would be useful to investigate various queries used in benchmarks such as XMark or in the XQuery exemplar and XQuery standard for expressibility using IFP. I suspect that mmany recursive queries encountered are IFP-expressible, using unrestricted recursion as overkill.

Weak accept

With some new ideas

Good

NO

MEDIUM

The paper investigates an inflationary fixed point in XQuery. It revisits two algorithms for computing such inflationary fixed points, namely Naive and Delta. Here, Delta is the more efficient algorithm of the two, but it is shown that Delta cannot always be traded for Naive for computing fixed points in XQuery (a counterexample is given). Consequently, the paper investigates a distributivity condition, which is a condition in which Delta can be traded for Naive. Two roads are taken for testing this distributivity notion on queries: a syntactical approach (using inference rules) and an algebraical approach (using conditions for pushing unions through other operators). Finally the effect of replacing the Naive implementation on fixpoint computations with the Delta implementation is experimentally investigated.

I found the paper to be interesting and mostly well-written, until Section 4. From there, the paper becomes hard to follow for me.

(1) I find specific research on this controlled form of recursion in XQuery to be a good idea. I think that this form of recursion is expressive enough to formulate quite some queries in practice, which contributes to the relevance of your work.

(2) Your optimizations for computing inflationary fixpoints seem to be easy to incorporate in existing engines and give a high payoff, according to your experiments.

(3) The paper is well-written, well-structured, and gives a nice overview of related work.

(1) What I found to be lacking is any intuition about the direction from non-distributivity of queries to non-preservation of the IFP semantics of the Delta implementation. Can you shed some light on that?

(2) The material starting from p.7 is very dense and it is not really clear what actually happens. In particular, you seem to state/assume the following, which you do not prove:

(a) Assessing distributivity on the relational path in Figure 6 corresponds to distributivity on the XQuery path.

(b) Table 1, which essentially claims which operators are distributive and which aren't. Here, the distributivity results are stated but not proved.

(c) There are some non-standard operators in Table 1, for which you claim that they are macros of standard operators. I assume that these non-standard operators are distributive iff all operators in the macro plans are distributive, something that should be easy to see, given the macro plans themselves.

(d) As for the plan templates, it seems to be the case that such templates are distributive iff all operators occurring in them are distributive. But, unfortunately, this is not explained (or even claimed in the paper). You should really explain what you mean here.

The root of (b), (c), and (d) is in the distributivity of the standard relational operators, so I have an inclination to believe the distributivity results. However, this should be argued much more clearly in the paper. It's strange that you don't even mention that (a) or (d) should be proved to obtain correctness of your technique.

p.1: ...can be assessed both syntactically and algebraically,

p.2: Perhaps a footnote briefly explaining the id-function of Figure 2 would be helpful for XQuery non-experts.

p.2: Why is i.e. italic?

p.2: Note that if expression e_rec does \emph{not} invoke node constructors (e.g., element{.}{.} or text{.}), IFP will always be defined, as the query operates over a finite domain of nodes.

p.3: Definition 2.3 is somewhat sloppy.

p.4: You say: "whenever expression e_rec is distributive [...], algorithm Delta preserves the desired IFP semantics." Do you know whether the other direction holds or not? Could you give a small counterexample if the other direction doesn't hold? Or, if you don't know, stating this in the text might be helpful.

p.9: Please also put the time measurements for Bidder network (huge) / MonetDB XQuery Delta in the ..m ..s format? That's easier to compare.

Weak reject

Highly novel

Unacceptable

NO

MEDIUM

The paper focuses on an extension of XQuery with a fixed point operator. The main goal is to detect when implementation of the operator can be optimized so that the fixpoint is computed incrementally. Two approaches are proposed for conservatively detecting the ability to perform this optimization, the first at the XQuery syntax level, the second at the level of a relational translation of XQuery.

The use of fixpoints in XML processing has not received much attention. The optimization the authors study here is natural and important, and the notion of using distributivity is clever. The idea of exploiting a relational encoding of XQuery in optimization is a good one also.

Unfortunately, I was not able to follow the relational approach at all. It relies heavily on a prior work on relational translation of XQuery that is not well-summarized here. As with any relational approach, the semantics of the translated expressions consists of the meanings of the relational operators plus the relational encoding and decoding maps: the encoding and decoding are not explained at all, and the operators (including several that are nonstandard) are presented in a rush. The explanation of the algorithm on plans refers to plan templates (page 7), which are never formalized within the paper. The paper states (page 6) that the details of Relational XQuery "do not affect our present discussion". Surely this is not the case.

Throughout the paper, the author's alternate between high-level commentary on the approach and references to prior work (e.g. LiXQuery, RegularXPath, Pathfinder) whose details are essential in order to make sense of the comments. This approach makes for a work that will be inaccessible even to interested and sympathetic ICDE attendees.

Example 1.1. "This is not expressible in XPath 2.0". You mean, without user-defined functions?

- submission (PDF)
- final paper (PDF) — long version published at EDBT 2009
- final paper (PDF) — short paper published at ICDE 2008
- technical report (PDF) — published at arXiv.org