Changes to VLDB 2008 Submission #191 ``Dependable Cardinality Forecasts for XQuery'' (Jens Teubner, Torsten Grust, Sebastian Maneth, Sherif Sakr) in response to reviewer comments. * GENERAL REMARKS / EDITS - Appendix B (Probabilities and Domain Sizes; formerly Appendix A) has been moved off the 12 core paper pages and now resides in the paper's new appendix (of max five pages as allowed by JDMR rules). This made room for some of the changes listed below. * REVIEWER 1 W1/W3. While we use a particularly simple variant of relational algebra (a SQL-inspired table algebra in which duplicate row elimination is explicit) to represent input XQuery syntax, note that we do *not* rely on the presence of an actual relational query engine and its optimizer. In the context of this work, the main aspect of algebraic query representation is not query optimization or evaluation but simplicity, compositionality, and analyzability. It is a feature of this scheme that it extracts and relies on proven and established bits of cardinality inference processes found in database kernels today and then merges these bits with XQuery/XPath specifics to assemble a cardinality forecaster for XQuery. As this is something that may be realized outside RDBMS engines with absolutely reasonable effort (e.g., in the context of our Pathfinder XQuery compiler or even a few hundred lines of prototyped Ruby scripts), we label this approach is perfectly applicable to any XQuery processor. Of course, if you have a database back-end at hand, such a table algebra also provides an immediately optimizable and executable specification of the XQuery semantics (this is the route we have followed in the Pathfinder and MonetDB/XQuery projects), but execution has *not* been the main intent in this work. To emphasize and clarify these points, we have added text to Section 1 (page 2) and to the end of Section 2 (just before Section 2.1, page 3). W2. A significantly deeper experimental assessment is now documented in the appendix. To demonstrate the universal applicability of our technique, we have shifted the focus of our experiments to queries from the W3C XQuery Working Group Use Cases Note (which we believe to be much more realistic than XMark). Experiments over XMark data are now reported in Appendix A.4. The main body of the paper now contains an excerpt of the experimental study of Appendix B (focussing on W3C use case queries, not XMark). The DB2 pureXML and our estimation prototype (implemented in terms of a few lines of Ruby code) have significant architectural differences, which renders it almost impossible to compare the two systems side-by-side. Note that, in spite of the first author's affiliation, we do not have access to any internals of DB2 pureXML. * REVIEWER 2 1. The typo on Page 4, Section 3.1 has been fixed. Thanks. 2. In Section 2, we spent more sentences on the introduction of the table algebra of Table 2 (e.g., we clarified the role of the circled mapping operators (<), (+), etc.). This latter family of operators is used to lift the evaluation of predicates and arithmetics onto the algebra level itself (we may thus operate on a single language level and obviate the need to a secondary language for Boolean or arithmetic expressions -- this is also mentioned in Section 4.2). Note that XQuery-specific operators like the "staircase" (XPath evaluation) and "atom" (node atomization) are separately explained in Section 4 where they are in context (Interfacing with XPath). The algebra has not been specifically invented for this paper but is also featured in a number of referenced articles (e.g., [10] and [12]; a reference to [10] has been added in Section 2). 3. A colocation of rules and associated commentary would spread the inference rules of Figures 4, 5, 7, and 8 over several pages. Since the rules *in their entirety* define the cardinality inference process, we opted to stick with a layout that collects rule in a single (or few) place(s). All rules are labeled (e.g., DOM-3, CARD-9, PATH-5) to allow for (1) unambiguous reference, and (2) to indicate the cluster (DOM, CARD, PATH) a rule is part of. We implemented LaTeX macros to enforce consistent and proper rule reference. This rule layout mode appears to be generally accepted and widespread and helps to appreciate the inference process per se (but we acknowledge that it incurs a certain need for "page flipping" while reading). * REVIEWER 3 W2. Due to the locality of cardinality inference (and property inference, in general), a single pass over the algebraic plan is sufficient to establish the required properties. (Neither does the inference process backtrack or iterate, nor is the underlying plan changed such that re-inference would be required.) The individual inference rules exclusively rely on the investigated operator and the properties already inferred in the immediate plan vicinity (see Section 2.1). As a consequence, the time taken by property inference is absolutely marginal. A note on this has been added in Section 5.2. W3. The experimental assessment is now much more in line with the focus of the paper: the estimation of cardinalities. We agree that an improvement in runtime performance incurs additional work (such as plan enumeration and selection), which is beyond the content of this work (and had not been documented). Please see also our comments to Reviewer 1 (W2) regarding the overhauled experimental sections. P1. Thanks for the pointer to the PODS 2008 work of Foster, Green, and Tannen. We see at least one interesting connection of that work to the present paper: In Section 7 (Semantics Via Relations) the authors sketch a theorem stating that the relational evaluation (via Datalog over a shredded relational XML encoding in this case) of an input XQuery expression yields the exact *same* semiring annotations as the "direct" evaluation of XQuery via NRC+srt. This coincides with our approach in the sense that (estimated) table row counts, derived algebraically, directly correspond to XQuery item sequence lengths. As of today, we could not yet further explore the application of the mentioned PODS 2008 ideas to our present work. Since we have faithful relational translation of the XQuery semantics in our hands, and based on the observation made in the previous paragraph, the PODS 2007 of Green, Karvounarakis, and Tannen "Provenance semirings" might also be applicable. This is something we will definitely look into. P2. Please see our comments to W3 of Reviewer 1. A key point to note is the compositionality of the loop lifting compilation scheme that we use. In this scheme, it is always possible to track the correspondence between any subexpression e of an input XQuery expression and its associated algebraic subplan q_e. If we can estimate the row count for q_e, we thus obtain an XQuery item estimate for the original e. Note that this does not involve the presence or execution capabilities of an actual relational query engine. Relational algebra is merely used as a simple, very explicit and well-understood representation of the XQuery semantics. Clarifying text has been added to Section 1 (page 2) and Section 2 (page 3).