masterThesis
Abordagens baseadas em teoria da informação para seleção automatizada de atributos
Fecha
2018-09-21Registro en:
JESUS, Jhoseph Kelvin Lopes de. Abordagens baseadas em teoria da informação para seleção automatizada de atributos. 2018. 107f. Dissertação (Mestrado em Sistemas e Computação) - Centro de Ciências Exatas e da Terra, Universidade Federal do Rio Grande do Norte, Natal, 2018.
Autor
Jesus, Jhoseph Kelvin Lopes de
Resumen
With the fast growing of complex data in real world applications, the feature selection
becomes a mandatory preprocessing step in any application to reduce both the complexity
of the data and the computing time. Based on that, several works have been produced in
order to develop efficient methods to perform this task. Most feature selection methods
select the best attributes based on some specic criteria. Although some advancement has
been made, a poor choice of a single algorithm or criteria to assess the importance of attributes,
and the arbitrary choice of attribute numbers made by the user may lead to poor
analysis. In order to overcome some of these issues, this paper presents the development
of two strands of automated attribute selection approaches. The first are fusion methods
of multiple attribute selection algorithms, which use ranking-based strategies and classifier
ensembles to combine feature selection algorithms in terms of data (Data Fusion)
and decision (Fusion Decision), allowing researchers to consider different perspectives in
the attribute selection stage. The second strand approaches the dynamic feature selection
context through the proposition of the PF-DFS method, an improvement of a dynamic
feature selection algorithm, using the idea of Pareto frontier multiobjective optimization,
which allows us to consider different perspectives of the relevance of the attributes and
to automatically define the number of attributes to select. The proposed approaches were
tested using several real and artificial databases and the results showed that when compared
to individual selection methods, the performance of one of the proposed methods is
remarkably higher. In fact, the results are promising since the proposed approaches have
also achieved superior performance when compared to established dimensionality reduction
methods, and by using the original data sets, showing that the reduction of noisy
and/or redundant attributes may have a positive effect on the performance of classification
tasks.