N. Polyzotis and M. Garofalakis, "XSKETCH synopses for XML data graphs", ACM Transactions on Database Systems, vol. 31, no. 3, pp. 1014-1063, Sept. 2006. doi: 10.1145/1166074.1166082
https://doi.org/10.1145/1166074.1166082
Effective support for XML query languages is becoming increasingly important with the emergenceof new applications that access large volumes of XML data. All existing proposals for queryingXML (e.g., XQuery) rely on a pattern-specification language that allows (1) path navigation andbranching through the label structure of the XML data graph, and (2) predicates on the values ofspecific path/branch nodes, in order to reach the desired data elements. Clearly, optimizing suchqueries requires approximating the result cardinality of the referenced paths and hence hinges onthe existence of concise synopsis structures that enable accurate compile-time selectivity estimatesfor complex path expressions over the base XML data. In this article, we introduce a novel approachto building and using statistical summaries of large XML data graphs for effective path-expressionselectivity estimation. Our proposed graph-synopsis model (termed XSKETCH) exploits localizedgraph stability and value-distribution summaries (e.g., histograms) to accurately approximate (inlimited space) the path and branching distribution, as well as the complex correlation patternsthat can exist between and across path structure and element values in the data graph. To the bestof our knowledge, ours is the first work to address this timely problem in the most general settingof graph-structured XML data with values, and complex (branching) path expressions.