bachelorThesis
Uma análise da influência do parâmetro de controle do limiar no método de aprendizado semissupervisionado FlexCon-C
Fecha
2018-12-12Registro en:
GORGÔNIO, Arthur Costa. UMA ANÁLISE DA INFLUÊNCIA DO PARÂMETRO DE CONTROLE DO LIMIAR NO MÉTODO DE APRENDIZADO SEMISSUPERVISIONADO FLEXCON-C. 2018. 110 p. Trabalho de Conclusão de Curso (Bacharelado em Sistemas de Informação)- Universidade Federal do Rio Grande do Norte, Caicó/RN, 2018.
Autor
Gorgônio, Arthur Costa
Resumen
Learning algorithms are effective and efficient tools for processing large volumes of data. However, real-world application databases are not fully labeled, this difficult the development of a model through traditional modes of machine learning. The semi-supervised machine learning arises to perform the training of algorithms capable of learning with partially labeled databases. The confidence of the classification process depends on several factors that include the type of the classifier and a set of parameters that customize them, besides the layout and/or the dataset’s characteristics. An important factor in this type of learning is the selection of examples to be included in the labeled data set. A way to make this selection is using a threshold that determinate the included instances for each iteration, allowing to label only the instances with high confidence value. The FlexCon-C method – derived from the Self-Training algorithm – make use of this strategy and the object of study of this paper were the three variations of the FlexCon-C (FlexCon-C1 (s), FlexCon-C1 (v), FlexCon-C2). In this research were analyzed different values for the threshold variation (cr), measuring the impact on the classification of semi-supervised learning. The results showed that there is no value for the parameter cr that is superior to the other in all cases, the best value depends on different configurations of the experiment, such as: technique, classifier and percentage of initially labeled data. Analyzing the accuracy by classifier, it was observed that Naïve Bayes and rpartXse did not present significant differences in the value of accuracy when the parameter cr was changed. However, the RIPPER obtain the best results by setting the value of cr > 5%, while the k-NN classifier achieved better accuracy with cr < 5%.