<efrbr:recordSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:efrbr="http://vfrbr.info/efrbr/1.1" xmlns:efrbr-work="http://vfrbr.info/efrbr/1.1/work" xmlns:efrbr-expression="http://vfrbr.info/efrbr/1.1/expression" xmlns:efrbr-manifestation="http://vfrbr.info/efrbr/1.1/manifestation" xmlns:efrbr-person="http://vfrbr.info/efrbr/1.1/person" xmlns:efrbr-corporateBody="http://vfrbr.info/efrbr/1.1/corporateBody" xmlns:efrbr-concept="http://vfrbr.info/efrbr/1.1/concept" xmlns:efrbr-structure="http://vfrbr.info/efrbr/1.1/structure" xmlns:efrbr-responsible="http://vfrbr.info/efrbr/1.1/responsible" xmlns:efrbr-subject="http://vfrbr.info/efrbr/1.1/subject" xmlns:efrbr-other="http://vfrbr.info/efrbr/1.1/other" xsi:schemaLocation="http://vfrbr.info/efrbr/1.1 http://vfrbr.info/schemas/1.1/efrbr.xsd"><efrbr:entities><efrbr-work:work identifier="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301"><efrbr-work:titleOfTheWork>A methodology for open information extraction and representation from large scientific corpora: the CORD-19 data exploration use case</efrbr-work:titleOfTheWork></efrbr-work:work><efrbr-expression:expression identifier="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301"><efrbr-expression:titleOfTheExpression>A methodology for open information extraction and representation from large scientific corpora: the CORD-19 data exploration use case</efrbr-expression:titleOfTheExpression><efrbr-expression:formOfExpression vocabulary="DIAS:TYPES">
            Peer-Reviewed Journal Publication
            Δημοσίευση σε Περιοδικό με Κριτές
         </efrbr-expression:formOfExpression><efrbr-expression:dateOfExpression type="issued">2022-02-08</efrbr-expression:dateOfExpression><efrbr-expression:dateOfExpression type="published">2020</efrbr-expression:dateOfExpression><efrbr-expression:languageOfExpression vocabulary="iso639-1">en</efrbr-expression:languageOfExpression><efrbr-expression:summarizationOfContent>The usefulness of automated information extraction tools in generating structured knowledge from unstructured and semi-structured machine-readable documents is limited by challenges related to the variety and intricacy of the targeted entities, the complex linguistic features of heterogeneous corpora, and the computational availability for readily scaling to large amounts of text. In this paper, we argue that the redundancy and ambiguity of subject&amp;ndash;predicate&amp;ndash;object (SPO) triples in open information extraction systems has to be treated as an equally important step in order to ensure the quality and preciseness of generated triples. To this end, we propose a pipeline approach for information extraction from large corpora, encompassing a series of natural language processing tasks. Our methodology consists of four steps: i. in-place coreference resolution, ii. extractive text summarization, iii. parallel triple extraction, and iv. entity enrichment and graph representation. We manifest our methodology on a large medical dataset (CORD-19), relying on state-of-the-art tools to fulfil the aforementioned steps and extract triples that are subsequently mapped to a comprehensive ontology of biomedical concepts. We evaluate the effectiveness of our information extraction method by comparing it in terms of precision, recall, and F1-score with state-of-the-art OIE engines and demonstrate its capabilities on a set of data exploration tasks.</efrbr-expression:summarizationOfContent><efrbr-expression:useRestrictionsOnTheExpression type="creative-commons">http://creativecommons.org/licenses/by/4.0/</efrbr-expression:useRestrictionsOnTheExpression><efrbr-expression:note type="journal name">Applied Sciences</efrbr-expression:note><efrbr-expression:note type="journal volume">10</efrbr-expression:note><efrbr-expression:note type="journal number">16</efrbr-expression:note></efrbr-expression:expression><efrbr-manifestation:manifestation identifier="https://dias.library.tuc.gr/view/91438"><efrbr-manifestation:titleOfTheManifestation>Papadopoulos_et_al_Appl. Sci._10(16)_2020.pdf</efrbr-manifestation:titleOfTheManifestation><efrbr-manifestation:publicationDistribution><efrbr-manifestation:placeOfPublicationDistribution type="distribution">Chania [Greece]</efrbr-manifestation:placeOfPublicationDistribution><efrbr-manifestation:publisherDistributor type="distributor">Library of TUC</efrbr-manifestation:publisherDistributor><efrbr-manifestation:dateOfPublicationDistribution>2022-02-08</efrbr-manifestation:dateOfPublicationDistribution></efrbr-manifestation:publicationDistribution><efrbr-manifestation:formOfCarrier>application/pdf</efrbr-manifestation:formOfCarrier><efrbr-manifestation:extentOfTheCarrier>2.6 MB</efrbr-manifestation:extentOfTheCarrier><efrbr-manifestation:accessRestrictionsOnTheManifestation>free</efrbr-manifestation:accessRestrictionsOnTheManifestation></efrbr-manifestation:manifestation><efrbr-person:person identifier="http://users.isc.tuc.gr/~dpapadopoulos6"><efrbr-person:nameOfPerson vocabulary="TUC:LDAP">
            Papadopoulos Dimitrios
            Παπαδοπουλος Δημητριος
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="62473A40-5FF6-4917-A616-52B6642D8E59"><efrbr-person:nameOfPerson vocabulary="">
            Papadakis Nikolaos
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-person:person identifier="D6A8B82C-B6B2-41C4-9685-BB022CDB381D"><efrbr-person:nameOfPerson vocabulary="">
            Litke Antonis
         </efrbr-person:nameOfPerson></efrbr-person:person><efrbr-corporateBody:corporateBody identifier="https://v2.sherpa.ac.uk/id/publisher/487"><efrbr-corporateBody:nameOfTheCorporateBody vocabulary="S/R:PUBLISHERS">
            MDPI
         </efrbr-corporateBody:nameOfTheCorporateBody></efrbr-corporateBody:corporateBody><efrbr-concept:concept identifier="FFE9B383-5C44-4B0E-B344-A49DE523CABD"><efrbr-concept:termForTheConcept>
            Information extraction
         </efrbr-concept:termForTheConcept></efrbr-concept:concept><efrbr-concept:concept identifier="377AFC5C-0D44-43A4-B77D-A35B1DA4D7AC"><efrbr-concept:termForTheConcept>
            Triple extraction
         </efrbr-concept:termForTheConcept></efrbr-concept:concept><efrbr-concept:concept identifier="7FF366BB-75F1-4097-B2EE-6DC5127213A7"><efrbr-concept:termForTheConcept>
            Bioinformatics
         </efrbr-concept:termForTheConcept></efrbr-concept:concept><efrbr-concept:concept identifier="F7A57E0D-BEF0-4911-A1AA-AD1349AACAB0"><efrbr-concept:termForTheConcept>
            Data mining
         </efrbr-concept:termForTheConcept></efrbr-concept:concept></efrbr:entities><efrbr:relationships><efrbr-structure:structureRelations><efrbr-structure:realizedThrough sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="expression" targetURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301"/><efrbr-structure:embodiedIn sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="manifestation" targetURI="http://purl.tuc.gr/dl/dias/45B7C7FA-E6C3-439F-8E47-C8A6D65E693A"/></efrbr-structure:structureRelations><efrbr-responsible:responsibleRelations><efrbr-responsible:createdBy sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="person" targetURI="http://users.isc.tuc.gr/~dpapadopoulos6"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="person" targetURI="http://users.isc.tuc.gr/~dpapadopoulos6" role="author"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="person" targetURI="62473A40-5FF6-4917-A616-52B6642D8E59" role="author"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="person" targetURI="D6A8B82C-B6B2-41C4-9685-BB022CDB381D" role="author"/><efrbr-responsible:realizedBy sourceEntity="expression" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="person" targetURI="https://v2.sherpa.ac.uk/id/publisher/487" role="publisher"/></efrbr-responsible:responsibleRelations><efrbr-subject:subjectRelations><efrbr-subject:hasSubject sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="concept" targetURI="FFE9B383-5C44-4B0E-B344-A49DE523CABD"/><efrbr-subject:hasSubject sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="concept" targetURI="377AFC5C-0D44-43A4-B77D-A35B1DA4D7AC"/><efrbr-subject:hasSubject sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="concept" targetURI="7FF366BB-75F1-4097-B2EE-6DC5127213A7"/><efrbr-subject:hasSubject sourceEntity="work" sourceURI="http://purl.tuc.gr/dl/dias/01DB0838-84BF-4BE3-A9D6-3C5449AC3301" targetEntity="concept" targetURI="F7A57E0D-BEF0-4911-A1AA-AD1349AACAB0"/></efrbr-subject:subjectRelations><efrbr-other:otherRelations/></efrbr:relationships></efrbr:recordSet>