Objeto de conferencia
On the Importance of Data Representation for the Success of Text Classification
Registro en:
isbn:978-987-1364-31-2
Autor
Cuello, Carolina Y.
Jofre Caradonna, Vanessa
Garciarena Ucelay, María José
Cagnina, Leticia
Institución
Resumen
Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered. XIX Workshop Base de Datos y Minería de Datos (WBDMD) Red de Universidades con Carreras en Informática