Το έργο με τίτλο Probabilistic declarative information extraction από τον/τους δημιουργό/ούς Wang Daisy Zhe, Michelakis Eirinaios, Franklin Michael J., Garofalakis Minos, Hellerstein, Joseph, 1952- διατίθεται με την άδεια Creative Commons Αναφορά Δημιουργού 4.0 Διεθνές
Βιβλιογραφική Αναφορά
D. Z. Wang, E. Michelakis, M. J. Franklin, M. Garofalakis and J. M. Hellerstein, "Probabilistic declarative information extraction", in 26th IEEE International Conference on Data Engineering, 2010.
Unstructured text represents a large fraction of theworld’s data. It often contains snippets of structured information(e.g., people’s names and zip codes). Information Extraction(IE) techniques identify such structured information in text. Inrecent years, database research has pursued IE on two fronts:declarative languages and systems for managing IE tasks, andprobabilistic databases for querying the output of IE. In thispaper, we make the first step to merge these two directions,without loss of statistical robustness, by implementing a state-ofthe-artstatistical IE model – Conditional Random Fields (CRF)– in the setting of a Probabilistic Database that treats statisticalmodels as first-class data objects. We show that the Viterbialgorithm for CRF inference can be specified declaratively inrecursive SQL. We also show the performance benefits relativeto a standalone open-source Viterbi implementation. This workopens up the optimization opportunities for queries involvingboth inference and relational operators over IE models.