Documentos de trabajo
Ontology extraction for index generation
Fecha
2004-06Registro en:
DUQUE, Cláudio Gottschalg; LOBIN, Henning. Ontology extraction for index generation. In: ICCC - INTERNATIONAL CONFERENCE ON ELECTRONIC PUBLISHING, 8., 2004, Brasília. Proceedings... Brasília: ELPUB, 2004. p. 111-120.
Autor
Duque, Cláudio Gottschalg
Lobin, Henning
Institución
Resumen
The administration of electronic publication in the Information Era congregates old and new problems,
especially those related with Information Retrieval and Automatic Knowledge Extraction. This article
presents an Information Retrieval System that uses Natural Language Processing and Ontology to
index collection s texts. We describe a system that constructs a domain specific ontology, starting
from the syntactic and semantic analyses of the texts that compose the collection. First the texts are
tokenized, then a robust syntactic analysis is made, subsequently the semantic analysis is accomplished
in conformity with a metalanguage of knowledge representation, based on a basic ontology composed
of 47 classes. The ontology, automatically extracted, generates richer domain specific knowledge.
It propitiates, through its semantic net, the right conditions for the user to find with larger efficiency
and agility the terms adapted for the consultation to the texts. A prototype of this system was built
and used for the indexation of a collection of 221 electronic texts of Information Science written in
Portuguese from Brazil. Instead of being based in statistical theories, we propose a robust Information Retrieval System that uses cognitive theories, allowing a larger efficiency in the answer to the users' queries.