Trabalho de Conclusão de Curso de Graduação
Web scraping de dados referentes a indicadores relacionados à produção rural
Fecha
2023-01-30Autor
Mohr, Guilherme Alan
Institución
Resumen
This work proposes to develop a Database represented by one or more CSV files, containing data related to the agricultural context, so that it is possible to allow an analysis to be carried out and, if possible, to assist in decision-making about agricultural production. and carrying out future research. In addition, it is intended to make this Database created available on the GIPAG (Interdisciplinary Group of Georeferenced Agro-Food Research) page. This data will be obtained through the Web Scraping process, which is the Information Extraction process applied on the Web. Therefore, a theoretical review was carried out on the Web Scraping process and on tools that can be used to carry out this process and that are compatible with the Python Programming Language. The language was chosen to perform the Web Scraping process, as it is versatile and has several libraries that facilitate this process. Based on this review of search tools, the three most prominent were listed, namely, the following tools: Scrapy, Beautiful Soup and Selenium. About each tool, the main characteristics will be presented, along with two examples of data extraction with each tool. Next, the Web Scraping process in them will be presented studies of the portals and the systems that implement the. Later, some of the main data present in the developed database will be described, detailing their source and relevance. Finally, final considerations and ideas for the continuation of this work will be presented.