Tesis
ML-MDLText: um método de classificação de textos multirrótulo de aprendizado incremental
Fecha
2020-03-27Registro en:
Autor
Bittencourt, Marciele de Menezes
Institución
Resumen
Single-label text classification has been extensively studied in the last decades and usually more attention has been given to offline learning scenarios, where all of the training data is available in advance. However, real-world text classification problems often involve multilabel instances and have dynamic textual patterns that can change frequently. In this context, ideally, the methods should be able to predict a subset of target labels rather than a single one, and to update their model incrementally to be scalable and adaptable to changes in data patterns using limited time and memory. Therefore, online and multilabel learning have attracted great research interest, since there are few methods capable of addressing both problems simultaneously. In this study, we present a text classification method based on the minimum description length principle. It can be applied to multilabel classification without requiring the transformation of the classification problem. It also takes advantage of dependency information among labels and naturally supports online learning. We evaluated its performance using fifteen datasets from different application domains and compared it with traditional benchmarks classifiers, considering offline and online learning scenarios. The results obtained by the proposed method were very competitive with the ones of existing state-of-the-art methods.