URI | http://purl.tuc.gr/dl/dias/12DCEC49-3FB5-4CFB-9587-A761D5EFB408 | - |
Identifier | http://dl.acm.org/citation.cfm?id=1061326&dl=ACM&coll=DL&CFID=723856831&CFTOKEN=21128548 | - |
Identifier | https://doi.org/10.1145/1061318.1061326 | - |
Language | en | - |
Extent | 54 pages | en |
Title | XML stream processing using tree-edit distance embeddings | en |
Creator | Garofalakis Minos | en |
Creator | Γαροφαλακης Μινως | el |
Creator | Kumar Amit | en |
Publisher | Association for Computing Machinery | en |
Content Summary | We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing a (worst-case) upper bound of O(log2n log*n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. Experimental results from an empirical study with both synthetic and real-life XML data trees validate our approach, demonstrating that the average-case behavior of our embedding techniques is much better than what would be predicted from our theoretical worst-case distortion bounds. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model. | en |
Type of Item | Peer-Reviewed Journal Publication | en |
Type of Item | Δημοσίευση σε Περιοδικό με Κριτές | el |
License | http://creativecommons.org/licenses/by/4.0/ | en |
Date of Item | 2015-10-29 | - |
Date of Publication | 2005 | - |
Subject | XML | en |
Bibliographic Citation | M. Garofalakis and A. Kumar, "XML stream processing using tree-edit distance embeddings", ACM Trans. Dat. Syst., vol. 30, no. 1, pp. 279-332, Mar. 2005. doi:10.1145/1061318.1061326 | en |