doctoralThesis
Um sistema imunológico artificial para classificação hierárquica e multi-label de funções de proteínas
Fecha
2010-02-26Registro en:
ALVES, Roberto Teixeira. Um sistema imunológico artificial para classificação hierárquica e multi-label de funções de proteínas. 2010. 219 f. Tese (Doutorado em Engenharia Elétrica e Informática Industrial) – Universidade Tecnológica Federal do Paraná, Curitiba, 2010.
Autor
Alves, Roberto Teixeira
Resumen
This thesis proposes a new approach based on Artificial Immune System (AIS) for hierarchical multi-label classification, where the classifiers produced by the system are represented in the form of IF-THEN classification rules. Hierarchical multi-label classification is a challenging problem, because an example is associated with one or more classes organized into a hierarchy and the class hierarchy must be considered in the construction of the classifiers. The proposed method addresses the construction of local hierarchical classifiers (where each classifier processes only examples of classes in a local region of the hierarchy) and global hierarchical classifiers (where a single classifier processes examples of all classes at the same time). The application domain used to validate the proposed methods was the prediction of the biological function of proteins, using terms of the Gene Ontology as classes to be predicted by the AIS. The performance of the algorithm was evaluated in computational experiments with 10 datasets of proteins. The evaluation criteria in these experiments were the predictive accuracy (accuracy rate and the area under the precision-recall curve) and the simplicity of the discovered knowledge (measured by the number of rules and total number of conditions in the discovered rules). The computational experiments allowed the identification of parameter settings and procedures that significantly influence the performance of the proposed method. The experiments comparing the proposed method with other methods have shown that in some datasets the proposed method outperformed other methods, whilst in other datasets it was not possible to outperform other methods proposed in the literature.