Inductive model generation for text classification using a bipartite heterogeneous network

Rossi, Rafael Geraldeli; Lopes, Alneu de Andrade; Faleiros, Thiago de Paulo; Rezende, Solange Oliveira

dc.creator	Rossi, Rafael Geraldeli
dc.creator	Lopes, Alneu de Andrade
dc.creator	Faleiros, Thiago de Paulo
dc.creator	Rezende, Solange Oliveira
dc.date.accessioned	2014-05-30T18:01:53Z
dc.date.accessioned	2018-07-04T16:48:32Z
dc.date.available	2014-05-30T18:01:53Z
dc.date.available	2018-07-04T16:48:32Z
dc.date.created	2014-05-30T18:01:53Z
dc.date.issued	2014-05
dc.identifier	Journal of Computer Science and Technology, Beijing, v.29, n.3, p.361-375, 2014
dc.identifier	http://www.producao.usp.br/handle/BDPI/45159
dc.identifier	10.1007/s11390-014-1436-7
dc.identifier	http://dx.doi.org/10.1007/s11390-014-1436-7
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1640648
dc.description.abstract	Algorithms for numeric data classification have been applied for text classification. Usually the vector space model is used to represent text collections. The characteristics of this representation such as sparsity and high dimensionality sometimes impair the quality of general-purpose classifiers. Networks can be used to represent text collections, avoiding the high sparsity and allowing to model relationships among different objects that compose a text collection. Such network-based representations can improve the quality of the classification results. One of the simplest ways to represent textual collections by a network is through a bipartite heterogeneous network, which is composed of objects that represent the documents connected to objects that represent the terms. Heterogeneous bipartite networks do not require computation of similarities or relations among the objects and can be used to model any type of text collection. Due to the advantages of representing text collections through bipartite heterogeneous networks, in this article we present a text classifier which builds a classification model using the structure of a bipartite heterogeneous network. Such an algorithm, referred to as IMBHN (Inductive Model Based on Bipartite Heterogeneous Network), induces a classification model assigning weights to objects that represent the terms for each class of the text collection. An empirical evaluation using a large amount of text collections from different domains shows that the proposed IMBHN algorithm produces significantly better results than k-NN, C4.5, SVM, and Naive Bayes algorithms.
dc.language	eng
dc.publisher	Springer
dc.publisher	Science Press
dc.publisher	Beijing
dc.relation	Journal of Computer Science and Technology
dc.rights	Copyright Springer Science+Business Media, LLC & Science Press
dc.rights	restrictedAccess
dc.subject	heterogeneous network
dc.subject	text classification
dc.subject	inductive model generation
dc.title	Inductive model generation for text classification using a bipartite heterogeneous network
dc.type	Artículos de revistas

Este ítem pertenece a la siguiente institución

Universidade de São Paulo (Brasil)