Dissertação de Mestrado
Representações de características visuais de baixo custo para recuperação de imagens
Fecha
2015-12-18Autor
Ramon Figueiredo Pessoa
Institución
Resumen
Mobile Visual Search (MVS) is a new research area in Content-Based Image retrieval (CBIR) which provides the services of search and retrieval of visual information specifically for mobile devices. The main challenges on mobile visual search include variations in image capturing conditions like different illumination, changes of scale and view angle, limitations of battery and high network cost incurred by data transmission. The main purpose of this work is the comparison of efficient and effective techniques for feature extraction on mobile devices in order to retrieve images especially on smartphones. We achieve our goal by comparing and proposing techniques to feature vector compression and mid-level representation (bag of words). Some approaches reduce energy consumption in mobile devices because they send more compact featurevector to be processed on the server side. A series of experiments were also conducted to evaluate aspects of effectiveness, efficiency and compactness of extracted features of images in order to perform content-based image retrieval on mobile devices. In this case, the user decides the best triple trade-off configuration regarding effectiveness,efficiency, and compactness of visual features. Therefore, we addressed two research issues in order to investigate and to propose effective solutions for image retrieval on mobile devices: 1) low-cost representation for mobile image search and 2) spatial visual feature extraction. First, we analyze the use of binary descriptors using mid-level representation and global descriptors (color, texture, and shape) in image retrieval context on mobile devices, as well as, image features compression techniques. We have tested twenty midlevel representations of binary descriptors (five binary descriptors four bag of words strategies: BinBoost, BRIEF, BRISK, FREAK, ORB descriptors with bag of words using hard assignment with average pooling or bag of words using hard assignment with maximum pooling or bag of words using soft assignment with average pooling or bag of words using soft assignment with maximum pooling), ten color descriptors, fivetexture descriptors and two shape descriptors. We also analyze the impact of dense sampling and sparse sampling to compute descriptors using bags of words strategies (dense sampling is the best option).The second research issue refers to the problem of extracting spatial information on images to improve the quality of image representation on mobile devices, which could be crucial to distinguish types of objects and scenes. The traditional pooling methods usually discard the spatial configuration for visual words in the image. We propose two approaches of spatial bags of visual words called BOBGrid (spatial Bag Of BICGrid) and BOBSlic (spatial Bag Of Slic) and compare them with our baseline called WSA (visual Word Spatial Arrangement) and with an improvement of the traditional bag of visual words called BOSSANova (Bag Of Statistical Sampling Analysis). The experiments indicate that the descriptors BIC (Border/Interior Pixel Classification a color descriptor) and DEOBSM (bag of words using DEnse sampling, ORB descriptor, Soft assignment and Maximum pooling) are the best options consideringthe trade-off configuration regarding effectiveness, efficiency, and compactness of visual features. In statistical analyzes, BOBGrid and BOBSlic are better than our baseline WSA in the WANG dataset. BOBGrid and BOBSlic also show higher precision compared to the BOSSANova in the WANG dataset.