Actas de congresos
Improving Active Learning With Sharp Data Reduction
Registro en:
9788086943794
20th International Conference In Central Europe On Computer Graphics, Visualization And Computer Vision, Wscg 2012 - Conference Proceedings. , v. , n. PART 1, p. 27 - 34, 2012.
2-s2.0-84877941420
Autor
Saito P.T.M.
De Rezende P.J.
Falcao A.X.
Suzuki C.T.N.
Gomes J.F.
Institución
Resumen
Statistical analysis and pattern recognition have become a daunting endeavour in face of the enormous amount of information in datasets that have continually been made available. In view of the infeasibility of complete manual annotation, one seeks active learning methods for data organization, selection and prioritization that could help the user to label the samples. These methods, however, classify and reorganize the entire dataset at each iteration, and as the datasets grow, they become blatantly inefficient from the user's point of view. In this work, we propose an active learning paradigm which considerably reduces the non-annotated dataset into a small set of relevant samples for learning. During active learning, random samples are selected from this small learning set and the user annotates only the misclassified ones. A training set with new labelled samples increases at each iteration and improves the classifier for the next one. When the user is satisfied, the classifier can be used to annotate the rest of the dataset. To illustrate the effectiveness of this paradigm, we developed an instance based on the optimum path forest (OPF) classifier, while relying on clustering and classification for the learning process. By using this method, we were able to iteratively generate classifiers that improve quickly, to require few iterations, and to attain high accuracy while keeping user involvement to a minimum. We also show that the method provides better accuracies on unseen test sets with less user involvement than a baseline approach based on the OPF classifier and random selection of training samples from the entire dataset.
PART 1 27 34 Angluin, D., Queries and concept learning (1988) Machine Learning, 2, pp. 319-342 Cappabianco, F.A.M., Ide, J.S., Falcäo, A.X., Li, C.-S.R., Automatic subcortical tissue segmentation of mr images using optimum-path forest clustering (2011) International Conference on Image Processing (ICIP), pp. 2653-2656 Cheng, Y., Mean shift, mode seeking, and clustering (1995) TPAMI, 17 (8), pp. 790-799 Cohn, D.A., Ghahramani, Z., Jordan, M.I., Active learning with statistical models (1996) JAIR, 4, pp. 129-145 Da Silva, A.T., Falcäo, A.X., Magalhäes, L.P., A new CBIR approach based on relevance feedback and optimum-path forest classification (2010) Journal of WSCG, pp. 73-80 Da Silva, A.T., Falcäo, A.X., Magalhäes, L.P., Active learning paradigms for CBIR systems based on optimum-path forest classification (2011) Pattern Recognition, 44, pp. 2971-2978 Davis, D.T., Hwang, J.N., Attentional focus training by boundary region data selection (1992) Intern. Joint Conference on Neural Networks (IJCNN), 1, pp. 676-681 (2011) Biometrics Database Distribution, , www.nd.edu/~cvrl/CVRL/Data_Sets.html, Faces The Computer Vision Laboratory, University of Notre Dame Holub, A., Perona, P., Burl, M.C., Entropy-based active learning for object recognition (2008) CVPRW, pp. 1-8 Jain, P., Kapoor, A., Active learning for large multi-class problems (2009) IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 762-769 Kapoor, A., Grauman, K., Urtasun, R., Darrell, T., Gaussian processes for object categorization (2010) International Journal of Computer Vision (IJCV), 88, pp. 169-188 Li, X., Wang, L., Sung, E., Multi-label SVM active learning for image classification (2004) International Conference on Image Processing (ICIP), 4, pp. 2207-2210 Papa, J.P., Falcäo, A.X., De Albuquerque, V.H.C., Tavares, J.M.R.S., Efficient supervised optimum-path forest classification for large datasets (2012) Pattern Recognition, 45, pp. 512-520 Papa, J.P., Falcäo, A.X., Suzuki, C.T.N., Supervised pattern classification based on optimum-path forest (2009) Intern. Journal of Imaging Systems and Technology (IJIST), 19 (2), pp. 120-131 (2011) Pen-Based Recognition of Handwritten Digits Dataset, , archive.ics.uci.edu/ml/datasets/Pen- Based+Recognition+of+Handwritten+Digits, Pendigits Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Zhang, H.-J., Two-dimensional multilabel active learning with an efficient online adaptation model for image classification (2009) IEEE Transact. on Pattern Analysis and Machine Intel., 31 (10), pp. 1880-1897 Rocha, L.M., Cappabianco, F.A.M., Falcäo, A.X., Data clustering as an optimum-path forest problem with applications in image analysis (2009) Intern. Journal of Imaging Systems and Technology (IJIST), 19 (2), pp. 50-68 Tong, S., Chang, E., Support vector machine active learning for image retrieval (2001) ICM, pp. 107-118. , ACM Tong, S., Koller, D., Support vector machine active learning with applications to text classification (2002) Journal of Machine Learning Research (JMLR), 2, pp. 45-66 Yan, R., Yang, J., Hauptmann, A., Automatically labeling video data using multi-class active learning (2003) IEEE Intern. Conference on Computer Vision (ICCV), 1, pp. 516-523