bachelorThesis
Applying optimized hierarchical NCM classification to public purchases of products in Brazil
Applying optimized hierarchical NCM classification to public purchases of products in Brazil
Registro en:
ALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public
purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022.
Autor
Alves Sobrinho, Pitágoras de Azevedo
Resumen
The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and
the error propagation problem was mitigated. The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and
the error propagation problem was mitigated.
Ítems relacionados
Mostrando ítems relacionados por Título, autor o materia.
-
Compendio de innovaciones socioambientales en la frontera sur de México
Adriana Quiroga -
Caminar el cafetal: perspectivas socioambientales del café y su gente
Eduardo Bello Baltazar; Lorena Soto_Pinto; Graciela Huerta_Palacios; Jaime Gomez -
Material de empaque para biofiltración con base en poliuretano modificado con almidón, metodos para la manufactura del mismo y sistema de biofiltración
OLGA BRIGIDA GUTIERREZ ACOSTA; VLADIMIR ALONSO ESCOBAR BARRIOS; SONIA LORENA ARRIAGA GARCIA