| URI | http://purl.tuc.gr/dl/dias/AA9284C5-22A7-459D-BB9C-3DBF79DBEF5E | - |
| Αναγνωριστικό | http://dl.acm.org/citation.cfm?doid=773153.773168 | - |
| Γλώσσα | en | - |
| Μέγεθος | 12 pages | en |
| Τίτλος | Correlating XML data streams using tree-edit distance embeddings | en |
| Δημιουργός | Garofalakis Minos | en |
| Δημιουργός | Γαροφαλακης Μινως | el |
| Δημιουργός | Kumar Amit | en |
| Εκδότης | Association for Computing Machinery | en |
| Περίληψη | We propose the first known solution to the problem of correlating, in small space, continuous streams of XML data through approximate (structure and content) matching, as defined by a general tree-edit distance metric. The key element of our solution is a novel algorithm for obliviously embedding tree-edit distance metrics into an L1 vector space while guaranteeing an upper bound of O(log2n log*n) on the distance distortion between any data trees with at most n nodes. We demonstrate how our embedding algorithm can be applied in conjunction with known random sketching techniques to: (1) build a compact synopsis of a massive, streaming XML data tree that can be used as a concise surrogate for the full tree in approximate tree-edit distance computations; and, (2) approximate the result of tree-edit-distance similarity joins over continuous XML document streams. To the best of our knowledge, these are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating (e.g., through similarity joins) XML data in the streaming model. | en |
| Τύπος | Δημοσίευση σε Συνέδριο | el |
| Τύπος | Conference Publication | en |
| Άδεια Χρήσης | http://creativecommons.org/licenses/by/4.0/ | en |
| Ημερομηνία | 2015-12-01 | - |
| Ημερομηνία Δημοσίευσης | 2003 | - |
| Θεματική Κατηγορία | Database management | en |
| Βιβλιογραφική Αναφορά | M. Garofalakis and A. Kumar, "Correlating XML data streams using tree-edit distance embeddings", in 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, June 2003, pp. 143-154. | en |