Το work with title Hybrid in-database inference for declarative information extraction by Wang Daisy Zhe, Franklin Michael J., Garofalakis Minos, Hellerstein, Joseph, 1952-, Wick Michael L. is licensed under Creative Commons Attribution 4.0 International
Bibliographic Citation
D. Z. Wang, M. J. Franklin, M. Garofalakis, J. M. Hellerstein and M. L. Wick, "Hybrid in-database inference for declarative information extraction", in ACM SIGMOD International Conference on Management of Data, 2011, pp. 517-528. doi: 10.1145/1989323.1989378
https://doi.org/10.1145/1989323.1989378
In the database community, work on information extraction (IE)has centered on two themes: how to effectively manage IE tasks,and how to manage the uncertainties that arise in the IE processin a scalable manner. Recent work has proposed a probabilisticdatabase (PDB) based declarative IE system that supports a leadingstatistical IE model, and an associated inference algorithm toanswer top-k-style queries over the probabilistic IE outcome. Still,the broader problem of effectively supporting general probabilisticinference inside a PDB-based declarative IE system remainsopen. In this paper, we explore the in-database implementations ofa wide variety of inference algorithms suited to IE, including twoMarkov chain Monte Carlo algorithms, Viterbi and sum-product algorithms.We describe the rules for choosing appropriate inferencealgorithms based on the model, the query and the text, consideringthe trade-off between accuracy and runtime. Based on these rules,we describe a hybrid approach to optimize the execution of a singleprobabilistic IE query to employ different inference algorithmsappropriate for different records. We show that our techniques canachieve up to 10-fold speedups compared to the non-hybrid solutionsproposed in the literature.