ExtraWeb: um sumarizador de documentos Web baseado em etiquetas HTML e ontologia

Silva, Patrick Pedreira

dc.contributor	Rino, Lúcia Helena Machado
dc.contributor	http://lattes.cnpq.br/0315640846525832
dc.contributor	http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=T850685
dc.creator	Silva, Patrick Pedreira
dc.date.accessioned	2007-08-21
dc.date.accessioned	2016-06-02T19:05:19Z
dc.date.available	2007-08-21
dc.date.available	2016-06-02T19:05:19Z
dc.date.created	2007-08-21
dc.date.created	2016-06-02T19:05:19Z
dc.date.issued	2006-07-10
dc.identifier	SILVA, Patrick Pedreira. ExtraWeb: um sumarizador de documentos Web baseado em etiquetas HTML e ontologia.. 2006. 168 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2006.
dc.identifier	https://repositorio.ufscar.br/handle/ufscar/322
dc.description.abstract	This dissertation presents an automatic summarizer of Web documents based on both HTML tags and ontological knowledge. It has been derived from two independent approaches: one that focuses solely upon HTML tags, and another that focuses only on ontological knowledge. The three approaches were implemented and assessed, indicating that associating both knowledge types have a promising descriptive power for Web documents. The resulting prototype has been named ExtraWeb. The ExtraWeb system explores the HTML structure of Web documents in Portuguese and semantic information using the Yahoo ontology in Portuguese. This has been enriched with additional terms extracted from both a thesaurus, Diadorim and the Wikipedia. In a simulated Web search, ExtraWeb achieved a similar utility degree to Google one, showing its potential to signal through extracts the relevance of the retrieved documents. This has been an important issue recently. Extracts may be particularly useful as surrogates of the current descriptions provided by the existing search engines. They may even substitute the corresponding source documents. In the former case, those descriptions do not necessarily convey relevant content of the documents; in the latter, reading full documents demands a substantial overhead of Web users. In both cases, extracts may improve the search task, provided that they actually signal relevant content. So, ExtraWeb is a potential plug-in of search engines, to improve their descriptions. However, its scability and insertion in a real setting have not yet been explored.
dc.publisher	Universidade Federal de São Carlos
dc.publisher	BR
dc.publisher	UFSCar
dc.publisher	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.rights	Acesso Aberto
dc.subject	Inteligência artificial
dc.subject	Processamento da linguagem natural
dc.subject	Sumarização automática
dc.title	ExtraWeb: um sumarizador de documentos Web baseado em etiquetas HTML e ontologia
dc.type	Tesis

Este ítem pertenece a la siguiente institución

Universidade Federal de São Carlos (Brasil)