Towards A Fine-Grained Entity Linking Approach
Fecha
2021Autor
Hogan, Aidan
Poblete, Barbara
UNIVERSIDAD DE CHILE
Institución
Resumen
The Entity Linking (EL) task involves linking mentions of entities in a text with their corresponding identifier in a Knowledge Base (KB) such as Wikipedia, BabelNet, DBpedia, Freebase, Wikidata, YAGO, etc. Numerous techniques have been proposed to address this task down through the years. However, not all works adopt the same convention regarding the entities that the EL task should target; for example, while some EL works target common entities like "interview" appearing in the KB, others only target named entities like "Michael Jackson". The lack of consensus on this issue (and others) complicates research on the EL task; for example, how can the performance of EL systems be evaluated and compared when systems may target different types of entities? While traditional EL approaches have largely focused on English texts, this problem does not affect only English, but also each language.
In this thesis, we first highlight the importance of formalizing the concept of "entity" and the
benefits it would bring to the Entity Linking community, in particular, relating to the construction and evaluation of gold standards for evaluation purposes. Motivated by the scarcity of annotated datasets -- even more in multilingual scenarios -- we propose VoxEL: a manually-annotated gold standard for multilingual EL featuring the same text expressed in five European languages. We compare the behavior of state of the art EL (multilingual) systems for five different languages. Overall, our results identify how the results of different languages compare and suggest that machine translation is now a competitive alternative to dedicated multilingual EL configurations.
The evident disagreement about "What should entity linking link?" is also a consequence of the different applications of EL. Rather than proposing isolated solutions, our position is to create a more granular definition that meets the majority of current needs. In this line, we propose a fine-grained categorization scheme for EL that distinguishes different types of mentions and links. We propose a vocabulary extension that expresses such categories in EL benchmark datasets. We then relabel (subsets of) three popular EL datasets according to our novel categorization scheme, where we additionally discuss a tool used to semi-automate the labeling process. We next present the performance results of five EL systems for individual categories. We further extend EL systems with Word Sense Disambiguation and Coreference Resolution components, creating initial versions of what we call Fine-Grained Entity Linking (FEL) systems, measuring the impact on performance per category. Finally, we propose a configurable performance measure based on fuzzy sets that can be adapted for different application scenarios. Our results highlight a lack of consensus on the goals of the EL task, show that the evaluated systems do indeed target different entities, and further reveal some open challenges for the (F)EL task regarding more complex forms of reference for entities.