Why Off-The-Shelf RDBMSs are Better at XPath

Why Off-The-Shelf RDBMSs are Better at XPath Than You Might Expect

Publication Details

Title

Why Off-The-Shelf RDMSs are Better at XPath Than You Might Expect

Authors

Torsten Grust, Jan Rittinger, and Jens Teubner

Published

Proceedings of the 2007 ACM SIGMOD Conference on Management of Data (Industrial Track)

Download

paper (PDF), presentation slides (PDF)

Abstract

To compensate for the inherent impedance mismatch between the relational data model (tables of tuples) and XML (ordered, unranked trees), tree join algorithms have become the prevalent means to process XML data in relational databases, most notably the TwigStack, structural join, and staircase join algorithms. However, the addition of these algorithms to existing systems depends on a significant invasion of the underlying database kernel, an option intolerable for most database vendors.

Here, we demonstrate that we can achieve comparable XPath performance without touching the heart of the system. We carefully exploit existing database functionality and accelerate XPath navigation by purely relational means: partitioned B-trees bring access costs to secondary storage to a minimum, while aggregation functions avoid an expensive computation and removal of duplicate result nodes to comply with the XPath semantics. Experiments carried out on IBM DB2 confirm that our approach can turn off-the-shelf database systems into efficient XPath processors.

Publication Log

March 2007

camera-ready for SIGMOD 2007

camera-ready paper (PDF)

November 2006

submission to SIGMOD 2007 (accepted)

reviews (results: accept, neutral)

June 2006

submission to ICDE 2007 (rejected)

reviews (results: weak reject, weak accept, weak accept)