Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Corpus-based methods for learning models of metaphor in modern Greek

Pechlivanis Konstantinos

Full record


URI: http://purl.tuc.gr/dl/dias/2C5C6EAD-312B-4D80-B99B-86E8B76319FA
Year 2017
Type of Item Master Thesis
License
Details
Bibliographic Citation Konstantinos Pechlivanis, "Corpus-based methods for learning models of metaphor in modern Greek", Master Thesis, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2017 https://doi.org/10.26233/heallink.tuc.68185
Appears in Collections

Summary

In this thesis, we propose a method for detecting metaphorical usage of content terms based on the hypothesis that metaphors can be detected by being characteristic of a different domain than the one they appear in. We formulate the problem as one of extracting knowledge from text classification models, where the latter have been created using standard text classification techniques without any knowledge of metaphor. We then extract from such models a measure of how characteristic of a domain a term is, providing us with a reliable method of identifying terms that are surprising for the context within which they are used.In order to investigate our research proposal we started with compiling-crawling a corpus of articles from three Greek newspapers that offer content on-line. In order to have an initial classification, we mapped the sections of these three newspapers to domains from the top level of the relevant taxonomy of the International Press Telecommunications Council (IPTC). The training set is only annotated with the broad thematic categories assigned by the newspapers’ editors.In order to evaluate our method, we have manually annotated 89 articles with metaphorical term usage. The manual annotation was carried out by an initial annotator, with an expert annotator resolving inconsistencies to create the golden corpus. The annotation task was designed and elaborated using Ellogon platform.In our experiments, we report results using Term Frequency - Inverse Document Frequency (TF-IDF) to identify the literal (characteristic) domain of terms and we analyse the interaction between TF-IDF and other typical word features, such as Part of Speech tags and Document Frequency. Terms could be words or N-grams. The classification of terms is accomplished using an adapted version of Maximum Likelihood Classifier.Our method makes single-term binary decisions about metaphorical usage. We applied Precision, Recall and F 1 -score as evaluation metrics. We compared our system to a naive baseline method and to relevant work as well. Although our model seems to be over-general, producing many false positives, the overall F 1 -score outperforms both the baseline method and the related previous work.

Available Files

Services

Statistics