Improving Active Learning With Sharp Data Reduction

Saito P.T.M.; De Rezende P.J.; Falcao A.X.; Suzuki C.T.N.; Gomes J.F.

Actas de congresos

Registro en:

9788086943794

20th International Conference In Central Europe On Computer Graphics, Visualization And Computer Vision, Wscg 2012 - Conference Proceedings. , v. , n. PART 1, p. 27 - 34, 2012.

http://www.scopus.com/inward/record.url?eid=2-s2.0-84877941420&partnerID=40&md5=751cee646eecf3ec1f3ba3cc6fdfab22

http://www.repositorio.unicamp.br/handle/REPOSIP/90749

http://repositorio.unicamp.br/jspui/handle/REPOSIP/90749

2-s2.0-84877941420

http://repositorioslatinoamericanos.uchile.cl/handle/2250/1260629

Autor

Saito P.T.M.

De Rezende P.J.

Falcao A.X.

Suzuki C.T.N.

Gomes J.F.

Institución

Universidade Estadual de Campinas (Brasil)

Resumen

Statistical analysis and pattern recognition have become a daunting endeavour in face of the enormous amount of information in datasets that have continually been made available. In view of the infeasibility of complete manual annotation, one seeks active learning methods for data organization, selection and prioritization that could help the user to label the samples. These methods, however, classify and reorganize the entire dataset at each iteration, and as the datasets grow, they become blatantly inefficient from the user's point of view. In this work, we propose an active learning paradigm which considerably reduces the non-annotated dataset into a small set of relevant samples for learning. During active learning, random samples are selected from this small learning set and the user annotates only the misclassified ones. A training set with new labelled samples increases at each iteration and improves the classifier for the next one. When the user is satisfied, the classifier can be used to annotate the rest of the dataset. To illustrate the effectiveness of this paradigm, we developed an instance based on the optimum path forest (OPF) classifier, while relying on clustering and classification for the learning process. By using this method, we were able to iteratively generate classifiers that improve quickly, to require few iterations, and to attain high accuracy while keeping user involvement to a minimum. We also show that the method provides better accuracies on unseen test sets with less user involvement than a baseline approach based on the OPF classifier and random selection of training samples from the entire dataset.

PART 1

Angluin, D., Queries and concept learning (1988) Machine Learning, 2, pp. 319-342

Cappabianco, F.A.M., Ide, J.S., Falcäo, A.X., Li, C.-S.R., Automatic subcortical tissue segmentation of mr images using optimum-path forest clustering (2011) International Conference on Image Processing (ICIP), pp. 2653-2656

Cheng, Y., Mean shift, mode seeking, and clustering (1995) TPAMI, 17 (8), pp. 790-799

Cohn, D.A., Ghahramani, Z., Jordan, M.I., Active learning with statistical models (1996) JAIR, 4, pp. 129-145

Da Silva, A.T., Falcäo, A.X., Magalhäes, L.P., A new CBIR approach based on relevance feedback and optimum-path forest classification (2010) Journal of WSCG, pp. 73-80

Da Silva, A.T., Falcäo, A.X., Magalhäes, L.P., Active learning paradigms for CBIR systems based on optimum-path forest classification (2011) Pattern Recognition, 44, pp. 2971-2978

Davis, D.T., Hwang, J.N., Attentional focus training by boundary region data selection (1992) Intern. Joint Conference on Neural Networks (IJCNN), 1, pp. 676-681

(2011) Biometrics Database Distribution, , www.nd.edu/~cvrl/CVRL/Data_Sets.html, Faces The Computer Vision Laboratory, University of Notre Dame

Holub, A., Perona, P., Burl, M.C., Entropy-based active learning for object recognition (2008) CVPRW, pp. 1-8

Jain, P., Kapoor, A., Active learning for large multi-class problems (2009) IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 762-769

Kapoor, A., Grauman, K., Urtasun, R., Darrell, T., Gaussian processes for object categorization (2010) International Journal of Computer Vision (IJCV), 88, pp. 169-188

Li, X., Wang, L., Sung, E., Multi-label SVM active learning for image classification (2004) International Conference on Image Processing (ICIP), 4, pp. 2207-2210

Papa, J.P., Falcäo, A.X., De Albuquerque, V.H.C., Tavares, J.M.R.S., Efficient supervised optimum-path forest classification for large datasets (2012) Pattern Recognition, 45, pp. 512-520

Papa, J.P., Falcäo, A.X., Suzuki, C.T.N., Supervised pattern classification based on optimum-path forest (2009) Intern. Journal of Imaging Systems and Technology (IJIST), 19 (2), pp. 120-131

(2011) Pen-Based Recognition of Handwritten Digits Dataset, , archive.ics.uci.edu/ml/datasets/Pen- Based+Recognition+of+Handwritten+Digits, Pendigits

Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Zhang, H.-J., Two-dimensional multilabel active learning with an efficient online adaptation model for image classification (2009) IEEE Transact. on Pattern Analysis and Machine Intel., 31 (10), pp. 1880-1897

Rocha, L.M., Cappabianco, F.A.M., Falcäo, A.X., Data clustering as an optimum-path forest problem with applications in image analysis (2009) Intern. Journal of Imaging Systems and Technology (IJIST), 19 (2), pp. 50-68

Tong, S., Chang, E., Support vector machine active learning for image retrieval (2001) ICM, pp. 107-118. , ACM

Tong, S., Koller, D., Support vector machine active learning with applications to text classification (2002) Journal of Machine Learning Research (JMLR), 2, pp. 45-66

Yan, R., Yang, J., Hauptmann, A., Automatically labeling video data using multi-class active learning (2003) IEEE Intern. Conference on Computer Vision (ICCV), 1, pp. 516-523

Materias

Mostrar el registro completo del ítem