URI | http://purl.tuc.gr/dl/dias/21D01BDC-8C52-4E8C-A0BE-F1DA7ED6A1A5 | - |
Identifier | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.127.294&rep=rep1&type=pdf | - |
Language | en | - |
Extent | 12 pages | en |
Title | Statistical synopses for graph-structured XML databases | en |
Creator | Polyzotis, Neoklis | en |
Creator | Garofalakis Minos | en |
Creator | Γαροφαλακης Μινως | el |
Publisher | Association for Computing Machinery | en |
Content Summary | Effective support for XML query languages is becoming increasingly important
with the emergence of new applications that access large volumes of
XML data. All existing proposals for querying XML (e.g., XQuery) rely on
a pattern-specification language that allows path navigation and branching
through the XML data graph in order to reach the desired data elements.
Optimizing such queries depends crucially on the existence of concise synopsis
structures that enable accurate compile-time selectivity estimates for
complex path expressions over graph-structured XML data. In this paper,
we introduce a novel approach to building and using statistical summaries
of large XML data graphs for effective path-expression selectivity estimation.
Our proposed graph-synopsis model (termed XSKETCH) exploits localized
graph stability to accurately approximate (in limited space) the path
and branching distribution in the data graph. To estimate the selectivities of
complex path expressions over concise XSKETCH synopses, we develop an
estimation framework that relies on appropriate statistical (uniformity and
independence) assumptions to compensate for the lack of detailed distribution
information. Given our estimation framework, we demonstrate that the
problem of building an accuracy-optimal XSKETCH for a given amount of
space is ✁✄✂-hard, and propose an efficient heuristic algorithm based on
greedy forward selection. Briefly, our algorithm constructs an XSKETCH
synopsis by successive refinements of the label-split graph, the coarsest
summary of the XML data graph. Our refinement operations act locally
and attempt to capture important statistical correlations between data paths.
Extensive experimental results with synthetic as well as real-life data sets
verify the effectiveness of our approach. To the best of our knowledge, ours
is the first work to address this timely problem in the most general setting
of graph-structured data and complex (branching) path expressions | en |
Type of Item | Πλήρης Δημοσίευση σε Συνέδριο | el |
Type of Item | Conference Full Paper | en |
License | http://creativecommons.org/licenses/by/4.0/ | en |
Date of Item | 2015-12-01 | - |
Date of Publication | 2002 | - |
Subject | Databases | en |
Bibliographic Citation | N. Polyzotis and M. Garofalakis, "Statistical synopses for graph-structured XML databases", in ACM SIGMOD International Conference on Management of Data, June 2002, pp. 358-369. | en |