Ιδρυματικό Αποθετήριο [SANDBOX]
Πολυτεχνείο Κρήτης

EN | EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Είσοδος

Αρχή

Statistical synopses for graph-structured XML databases

Polyzotis, Neoklis, Garofalakis Minos

Απλή Εγγραφή

URI	http://purl.tuc.gr/dl/dias/21D01BDC-8C52-4E8C-A0BE-F1DA7ED6A1A5	-
Αναγνωριστικό	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.127.294&rep=rep1&type=pdf	-
Γλώσσα	en	-
Μέγεθος	12 pages	en
Τίτλος	Statistical synopses for graph-structured XML databases	en
Δημιουργός	Polyzotis, Neoklis	en
Δημιουργός	Garofalakis Minos	en
Δημιουργός	Γαροφαλακης Μινως	el
Εκδότης	Association for Computing Machinery	en
Περίληψη	Effective support for XML query languages is becoming increasingly important with the emergence of new applications that access large volumes of XML data. All existing proposals for querying XML (e.g., XQuery) rely on a pattern-specification language that allows path navigation and branching through the XML data graph in order to reach the desired data elements. Optimizing such queries depends crucially on the existence of concise synopsis structures that enable accurate compile-time selectivity estimates for complex path expressions over graph-structured XML data. In this paper, we introduce a novel approach to building and using statistical summaries of large XML data graphs for effective path-expression selectivity estimation. Our proposed graph-synopsis model (termed XSKETCH) exploits localized graph stability to accurately approximate (in limited space) the path and branching distribution in the data graph. To estimate the selectivities of complex path expressions over concise XSKETCH synopses, we develop an estimation framework that relies on appropriate statistical (uniformity and independence) assumptions to compensate for the lack of detailed distribution information. Given our estimation framework, we demonstrate that the problem of building an accuracy-optimal XSKETCH for a given amount of space is ✁✄✂-hard, and propose an efficient heuristic algorithm based on greedy forward selection. Briefly, our algorithm constructs an XSKETCH synopsis by successive refinements of the label-split graph, the coarsest summary of the XML data graph. Our refinement operations act locally and attempt to capture important statistical correlations between data paths. Extensive experimental results with synthetic as well as real-life data sets verify the effectiveness of our approach. To the best of our knowledge, ours is the first work to address this timely problem in the most general setting of graph-structured data and complex (branching) path expressions	en
Τύπος	Πλήρης Δημοσίευση σε Συνέδριο	el
Τύπος	Conference Full Paper	en
Άδεια Χρήσης	http://creativecommons.org/licenses/by/4.0/	en
Ημερομηνία	2015-12-01	-
Ημερομηνία Δημοσίευσης	2002	-
Θεματική Κατηγορία	Databases	en
Βιβλιογραφική Αναφορά	N. Polyzotis and M. Garofalakis, "Statistical synopses for graph-structured XML databases", in ACM SIGMOD International Conference on Management of Data, June 2002, pp. 358-369.	en

Υπηρεσίες

Στατιστικά

Copyright © DIAS 2013