Article
Multi-class sentiment analysis using a hierarchical logistic model tree approach
Date
2014Author
Homsi, Masun Nabhan
Universidad de Cuenca
Dirección de Investigación de la Universidad de Cuenca
DIUC
Institutions
Abstract
This paper proposes a new hybrid system for multi-class sentiment analysis based on General Inquirer (GI) dictionary and a hierarchical Logistic Model Tree (LMT) approach. This new system consists of three layers, the Bipolar Layer (BL) is of one LMT (LMT-1) for classifying sentiment polarity, while the Intensity Layer (IL) comprises two LTMs (LMT-2 and LMT3) for detecting separately three positive and three negative sentiment intensities. Only in construction phase, the Grouping Layer (GL) is used to cluster positive and negative instances by employing 2 k-means respectively. In Pre-processing phase, the raw text data is subjected to a tokenizer, a tagger, a stemmer and finally to GI dictionary to count and label only verbs, nouns, adjectives and adverbs with 24 markers that are used later to compute feature vectors. In Sentiments Classification phase, feature vectors are first introduced to LMT-1, then they are grouped in GL according to class label, afterward these groups of instances are labeled manually, and finally positive instances are introduced to LMT-2 and negative instances to LMT-3. The three trees are trained and tested on Movie Review and SenTube datasets utilizing 10-folds stratified cross validation. LMT-1 yields a tree of 48 leaves and 95 of size with 90.88% of accuracy, while both LMT-2 and LMT-3 provide two trees of 1 leaf and 1 of size with 99.28% and 99.37% of accuracy respectively. Experiments show that the proposed hierarchical classification methodology gives a better performance compared to other prevailing approaches.