bachelorThesis
Generación de un grafo de conocimiento de periódicos antiguos del Ecuador a través de procesos OCR.
Fecha
2023-07-26Autor
Torres Cordero, Raul Sebastian
Valdez Llivisaca, Jonnathan Andrés
Institución
Resumen
History reveals to us the existence of a multitude of events that unfold in the world day by
day, leaving a footprint in time. In the past, the transmission of this knowledge was done
orally and kept alive through generations. However, the advancement of technology has
revolutionized the way we access information and has allowed us to explore historical records
on an unprecedented scale.
In this context, a challenge arises: a large portion of this valuable information lies dormant
in old newspapers, which are in a state of deterioration and are difficult to handle. These
newspapers contain detailed accounts of events that marked Ecuador’s history in the 19th
and 20th centuries, but accessing that information quickly and efficiently has become a challenge.
To address this problem, this thesis proposes a solution based on text digitization, text processing, and semantic web technologies. The main objective is to extract information from old
newspapers, organize it in a structured manner, and generate a knowledge graph that represents the events that occurred in Ecuador during that historical period. As part of this solution,
a prototype search engine has also been developed that utilizes the generated knowledge
graph. This search engine is one of the many ways to exploit the graph and allows users to
make specific queries and searches related to historical events, people, places, and topics
in the context of old newspapers.
The proposed solution involves the automation of each step of the process. To achieve this,
several widgets have been built in Orange, a visual data analysis platform, that allows for
specific tasks to be performed at each stage of the process. These widgets include text digitization tools, text processing techniques, and semantic web algorithms that work together
to extract relevant information, identify entities and relationships, obtain Word Embeddings,
and generate a knowledge graph enriched with historical events.