bachelorThesis
Criação de um banco de dados não relacional a partir de informação extraída de textos
Fecha
2018-05-29Registro en:
XAVIER, Eduardo Semkiw; BATISTA, Jonathan da Silva. Criação de um banco de dados não relacional a partir de informação extraída de textos. 2018. 39 f. Trabalho de Conclusão de Curso (Tecnologia em Análise e Desenvolvimento de Sistemas) - Universidade Tecnológica Federal do Paraná, Ponta Grossa, 2018.
Autor
Xavier, Eduardo Semkiw
Batista, Jonathan da Silva
Resumen
Information and data are currently concentrated in huge amounts within text files. And the fact that most information treated by humans is in unstructured texts justifies the importance of extracting data. The purpose of this paper is to develop an application capable of analyzing and extracting useful information from PDF files. The application will use an external tool to convert PDF and extract the content into text file. It will then search for patterns, such as addresses and dates. Finally, it will store the treated data in a NoSQL database. Since the extraction of information in PDF files generates a large amount of data, there is a need for automated support to the user, due to the difficulty of doing so in a totally manual way.