LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

Smith Ellery, Papadopoulos Dimitrios, Braschler, Martin, Stockinger Kurt

URI	http://purl.tuc.gr/dl/dias/B2FF03F9-5FEB-434A-B62A-400341E67404	-
Αναγνωριστικό	https://doi.org/10.1016/j.is.2021.101938	-
Αναγνωριστικό	https://www.sciencedirect.com/science/article/pii/S030643792100137X	-
Γλώσσα	en	-
Μέγεθος	15 pages	en
Τίτλος	LILLIE: Information extraction and database integration using linguistics and learning-based algorithms	en
Δημιουργός	Smith Ellery	en
Δημιουργός	Papadopoulos Dimitrios	en
Δημιουργός	Παπαδοπουλος Δημητριος	el
Δημιουργός	Braschler, Martin	en
Δημιουργός	Stockinger Kurt	en
Εκδότης	Elsevier	en
Περιγραφή	https://github.com/OIELILLIE/LILLIE	en
Περιγραφή	This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 863410. The research work of D.P. was supported by the Hellenic Foundation for Research and Innovation (HFRI), Greece under the HFRI PhD Fellowship grant (Fellowship Number: 50, 2nd call).	en
Περίληψη	Querying both structured and unstructured data via a single common query interface such as SQL or natural language has been a long standing research goal. Moreover, as methods for extracting information from unstructured data become ever more powerful, the desire to integrate the output of such extraction processes with “clean”, structured data grows. We are convinced that for successful integration into databases, such extracted information in the form of “triples” needs to be both (1) of high quality and (2) have the necessary generality to link up with varying forms of structured data. It is the combination of both these aspects, which heretofore have been usually treated in isolation, where our approach breaks new ground. The cornerstone of our work is a novel, generic method for extracting open information triples from unstructured text, using a combination of linguistics and learning-based extraction methods, thus uniquely balancing both precision and recall. Our system called LILLIE (LInked Linguistics and Learning-Based Information Extractor) uses dependency tree modification rules to refine triples from a high-recall learning-based engine, and combines them with syntactic triples from a high-precision engine to increase effectiveness. In addition, our system features several augmentations, which modify the generality and the degree of granularity of the output triples. Even though our focus is on addressing both quality and generality simultaneously, our new method substantially outperforms current state-of-the-art systems on the two widely-used CaRB and Re-OIE16 benchmark sets for information extraction. We have made our code publicly available1 to facilitate further research.	en
Τύπος	Peer-Reviewed Journal Publication	en
Τύπος	Δημοσίευση σε Περιοδικό με Κριτές	el
Άδεια Χρήσης	http://creativecommons.org/licenses/by/4.0/	en
Ημερομηνία	2024-06-20	-
Ημερομηνία Δημοσίευσης	2022	-
Θεματική Κατηγορία	Information extraction	en
Θεματική Κατηγορία	Data integration	en
Θεματική Κατηγορία	Machine learning for database systems	en
Βιβλιογραφική Αναφορά	E. Smith, D. Papadopoulos, M. Braschler, and K. Stockinger, “LILLIE: Information extraction and database integration using linguistics and learning-based algorithms,” Inf. Syst., vol. 105, Mar. 2022, doi: 10.1016/j.is.2021.101938.	en

Αναζήτηση

Πλοήγηση

Ο Χώρος μου

LILLIE: Information extraction and database integration using linguistics and learning-based algorithms

Smith Ellery, Papadopoulos Dimitrios, Braschler, Martin, Stockinger Kurt

Διαθέσιμα αρχεία

Υπηρεσίες

Εξαγωγή

Κοινοποίηση

Στατιστικά

Μεταδεδομένων & Περιεχομένου σε METS:

Μεταδεδομένων σε Μορφότυπο: