Tesis
Hierarchical multi-label classification for tree and DAG hierarchies
Autor
MALLINALI RAMIREZ CORONA
Institución
Resumen
The core of supervised classification consists in assigning to an object or
phenomenon one of a previously specified set of categories or classes. There
are more complex problems where, instead of a single label, a set of labels
are assigned to each instance, this is called multi-label classification. When
the labels in a multi-label classification problem are ordered in a predefined
structure, typically a tree or a Direct Acyclic Graph (DAG); the task is called
Hierarchical Multi-label Classification (HMC).
There are HMC methods that create a global model which take advantage
of the relations (predefined structure) of the labels. However these methods
tend to create too complex models unusable for large scale data. Other
methods divide the problem in a set of subproblems, which usually does not
benefit from the predefined structure.
This thesis addresses the problem of hierarchical classification for tree and
DAG structures considering large datasets with a considerable number of
labels. A local classifier per parent node is trained for each non-leaf node
in the hierarchy. Our method exploits the correlation of the labels with its
ancestors in the hierarchy and evaluates each possible path from the root to a
leaf node, taking into account the level of the predicted labels to give a score
to each path and finally return the one with the best score.
In some cases there are instances whose labels do not reach a leaf node, for
this cases we developed an extension of the base method for Non Mandatory
Leaf Node Prediction (NMLNP); in which a pruning phase is performed
before selecting the best path.
We noticed that many evaluation measures scored the short paths that
only predict the most general cases better than longer more specific paths,
that is why we also propose a new evaluation measure that avoids the bias
toward conservative predictions in the case of NMLNP.
We tested our methods with 18 datasets with tree and DAG structured
hierarchies against a number of state-of-the-art methods. The evaluation
shows the advantages of these methods, in terms of predictive performance,
execution time and scalability compared with other methods for hierarchical
classification. Our methods proved to obtain superior results when dealing
with deep hierarchies and competitive with shallower hierarchies.
Ítems relacionados
Mostrando ítems relacionados por Título, autor o materia.
-
Compendio de innovaciones socioambientales en la frontera sur de México
Adriana Quiroga -
Caminar el cafetal: perspectivas socioambientales del café y su gente
Eduardo Bello Baltazar; Lorena Soto_Pinto; Graciela Huerta_Palacios; Jaime Gomez -
Material de empaque para biofiltración con base en poliuretano modificado con almidón, metodos para la manufactura del mismo y sistema de biofiltración
OLGA BRIGIDA GUTIERREZ ACOSTA; VLADIMIR ALONSO ESCOBAR BARRIOS; SONIA LORENA ARRIAGA GARCIA