Newsminer: um sistema de data warehouse baseado em texto de notícias

Nogueira, Rodrigo Ramos

dc.contributor	Gonzalez, Sahudy Montenegro
dc.contributor	http://lattes.cnpq.br/9826346918182685
dc.contributor	http://lattes.cnpq.br/0327974399448757
dc.creator	Nogueira, Rodrigo Ramos
dc.date.accessioned	2017-10-09T14:14:24Z
dc.date.available	2017-10-09T14:14:24Z
dc.date.created	2017-10-09T14:14:24Z
dc.date.issued	2017-05-12
dc.identifier	NOGUEIRA, Rodrigo Ramos. Newsminer: um sistema de data warehouse baseado em texto de notícias. 2017. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, Sorocaba, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/9138.
dc.identifier	https://repositorio.ufscar.br/handle/ufscar/9138
dc.description.abstract	Data and text mining applications managing Web data have been the subject of recent research. In every case, data mining tasks need to work on clean, consistent, and integrated data for obtaining the best results. Thus, Data Warehouse environments are a valuable source of clean, integrated data for data mining applications. Data Warehouse technology has evolved to retrieve and process data from the Web. In particular, news websites are rich sources that can compose a linguistic corpus. By inserting corpus into a Data Warehousing environment, applications can take advantage of the flexibility that a multidimensional model and OLAP operations provide. Among the benefits are the navigation through the data, the selection of the part of the data considered relevant, data analysis at different levels of abstraction, and aggregation, disaggregation, rotation and filtering over any set of data. This paper presents Newsminer, a data warehouse environment, which provides a consistent and clean set of texts in the form of a multidimensional corpus for consumption by external applications and users. The proposal includes an architecture that integrates the gathering of news in real time, a semantic enrichment module as part of the ETL stage, which adds semantic properties to the data such as news category and POS-tagging annotation and the access to data cubes for consumption by applications and users. Two experiments were performed. The first experiment selects the best news classifier for the semantic enrichment module. The statistical analysis of the results indicated that the Perceptron classifier achieved the best results of F-measure, with a good result of computational time. The second experiment collected data to evaluate real-time news preprocessing. For the data set collected, the results indicated that it is possible to achieve online processing time.
dc.language	por
dc.publisher	Universidade Federal de São Carlos
dc.publisher	UFSCar
dc.publisher	Programa de Pós-Graduação em Ciência da Computação - PPGCC-So
dc.publisher	Câmpus Sorocaba
dc.rights	Acesso aberto
dc.subject	Mineração de dados (Computação)
dc.subject	Sites da Web
dc.subject	Corpora multidimensional
dc.subject	Enriquecimento semântico
dc.subject	Categorização de notícias
dc.subject	OLAP
dc.subject	Multidimensional corpora
dc.subject	Data mining
dc.subject	Web sites
dc.subject	Data Warehouse
dc.subject	News websites
dc.subject	Semantic enrichment
dc.subject	News categorization
dc.title	Newsminer: um sistema de data warehouse baseado em texto de notícias
dc.type	Tesis

Este ítem pertenece a la siguiente institución

Universidade Federal de São Carlos (Brasil)