Ιδρυματικό Αποθετήριο [SANDBOX]
Πολυτεχνείο Κρήτης
EN  |  EL

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

Hybrid in-database inference for declarative information extraction

Wang Daisy Zhe, Franklin Michael J., Garofalakis Minos, Hellerstein, Joseph, 1952-, Wick Michael L.

Πλήρης Εγγραφή


URI: http://purl.tuc.gr/dl/dias/9FBD290C-DC9B-419A-9A80-E532CB9521E0
Έτος 2011
Τύπος Πλήρης Δημοσίευση σε Συνέδριο
Άδεια Χρήσης
Λεπτομέρειες
Βιβλιογραφική Αναφορά D. Z. Wang, M. J. Franklin, M. Garofalakis, J. M. Hellerstein and M. L. Wick, "Hybrid in-database inference for declarative information extraction", in ACM SIGMOD International Conference on Management of Data, 2011, pp. 517-528. doi: 10.1145/1989323.1989378 https://doi.org/10.1145/1989323.1989378
Εμφανίζεται στις Συλλογές

Περίληψη

In the database community, work on information extraction (IE)has centered on two themes: how to effectively manage IE tasks,and how to manage the uncertainties that arise in the IE processin a scalable manner. Recent work has proposed a probabilisticdatabase (PDB) based declarative IE system that supports a leadingstatistical IE model, and an associated inference algorithm toanswer top-k-style queries over the probabilistic IE outcome. Still,the broader problem of effectively supporting general probabilisticinference inside a PDB-based declarative IE system remainsopen. In this paper, we explore the in-database implementations ofa wide variety of inference algorithms suited to IE, including twoMarkov chain Monte Carlo algorithms, Viterbi and sum-product algorithms.We describe the rules for choosing appropriate inferencealgorithms based on the model, the query and the text, consideringthe trade-off between accuracy and runtime. Based on these rules,we describe a hybrid approach to optimize the execution of a singleprobabilistic IE query to employ different inference algorithmsappropriate for different records. We show that our techniques canachieve up to 10-fold speedups compared to the non-hybrid solutionsproposed in the literature.

Υπηρεσίες

Στατιστικά