TCC
Investigação de métodos de seleção de atributos para problemas de classificação hierárquica multirrótulo
Fecha
2023-04-06Registro en:
Autor
Silva, Luan Vinicius Moraes da
Institución
Resumen
Classification is the task of assigning data instances to classes. In Hierarchical Multi- label Classification, instances may belong to two or more classes (labels) simultaneously, where the classes are hierarchically structured. Feature Selection is part of the data pre- processing step and plays an important role in classification tasks for Machine Learning, as it can effectively reduce the size of the dataset, removing irrelevant/redundant attributes and improving prediction performance of the classifier. Although many real-world prob- lems are from multi-label hierarchical domain, most related research addresses the feature selection task focusing on single-label problems. In many works, even when the proposal addresses multiple labels, the associated class structure is not hierarchical. Therefore, in this work, we study how feature selection can be used in the context of Hierarchical Multi- Label Classification. For this purpose, we compare global feature selectors known in the literature with flat feature selectors adapted for hierarchical structures. The global fea- ture selectors used were Relief, Genie3 and Symbolic, and the flat feature selectors were ReliefF and Information Gain. For flat selectors, strategies were adopted to transform the Hierarchical Multi-label problem into a non-hierarchical multi-label problem, using the Label Powerset and Binary Relevance transformations. As main results, the global evaluators produced subsets of relevant features, improving the predictive performance while reducing the original dataset by up to 75% of the original dimensionality, with emphasis on the evaluators based on the Genie3 and Symbolic set. Despite the improvement, the flat evaluators were proportionally better compared to the global evaluators.