Ordenamiento de imágenes recuperadas utilizando un enfoque de fusión de información multimodal

RICARDO CHAVEZ GARCIA

info:eu-repo/semantics/masterThesis

Registro en:

http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/496

https://repositorioslatinoamericanos.uchile.cl/handle/2250/7805714

Autor

RICARDO CHAVEZ GARCIA

Institución

Instituto Nacional de Astrofísica, Óptica y Electrónica (México)

Resumen

Image retrieval task has been, along recent years, an active research area. Several retrieval methods have been proposed based on visual, textual or multimodal image description and, although acceptable results have been obtained, current methods continue to offer an inappropriate order for retrieved list of images. The problem of lack of order is due to, mainly, it is a subjective task to modeling the user’s search intention and consider contextual information related to that search. In this research work we propose a method to improve the original order of a list of images retrieved by an Image Retrieval System (IRS). Motivated by the hypothesis that by including and combining all available information in the list of retrieved images, we can better identify relevant images and the user’s search intention, this work proposes a re-ranking method that improves the order obtained by a base IRS. The proposed method combines the internal contextual information obtained from the difference between the visual and textual features of recovered images; and, external information obtained from the original order and the difference between recovered images and the query. Moreover, proposed method can include a relevance feedback approach to reduce the domain of user’s search intention. All the above features were combined using a Markov random field (MRF), allowing separate the relevant images from irrelevant ones, and providing a more appropriate order placing relevant images in top positions. To evaluate the proposed method several experiments were designed with the semi-structured collection IAPR-TC12, using binary-word based textual representation and SIFT features based visual representation. The experimental results showed that the proposed method, in unimodal (using only visual or textual features) and multimodal fusion stage, improves base retrieval system, including experiments with automatic or simulated feedback and using different parameter settings, showing that the proposed method is robust.

La tarea de recuperación de imágenes ha sido, a lo largo de los últimos años, una área de investigación muy activa. Se han propuesto varios métodos de recuperación basándose en la descripción visual, textual o multimodal de las imágenes; y aunque han obtenido resultados aceptables, siguen lidiando con el ordenamiento inapropiado en la lista de imágenes recuperadas. Este problema de falta de orden se debe, principalmente, a que es una tarea subjetiva el modelar la intención de búsqueda del usuario, y considerar la información contextual relacionada con dicha búsqueda. En este trabajo de investigación se propone un método que permite mejorar el orden original de las imágenes recuperadas por un sistema de recuperación de imágenes (SRI). Motivados por la hipótesis que al incluir y combinar toda la información disponible en la lista de imágenes recuperadas, se pueden identificar mejor las imágenes relevantes así como la intención de búsqueda del usuario, en este trabajo se propuso un método que mejora el orden obtenido por un SRI base. El método propuesto combina la información contextual interna, proveniente de la diferencia entre las imágenes recuperadas, obtenida a partir de sus atributos visuales y textuales; y la información externa, obtenida a partir del orden original y de la similitud de las imágenes con la consulta realizada. Además, el método incluye un enfoque de retroalimentación de relevancia para reducir el dominio de la intención de búsqueda del usuario. Todos las características antes mencionadas fueron combinadas mediante un campo aleatorio de Markov (CAM), permitiendo separar las imágenes relevantes de las que no lo son y proveyendo de un orden más apropiado al colocar dichas imágenes relevantes en las primeras posiciones. Para evaluar el método propuesto se desarrollaron experimentos con la colección semi-estructurada IAPR-TC12, utilizando atributos textuales de bolsa de palabras binaria y atributos visuales locales SIFT (Scale Invariant Feature Transform).

Materias

info:eu-repo/classification/Recuperación de imágenes/Image retrieval

info:eu-repo/classification/Procesos de Markov/Markov processes

info:eu-repo/classification/Relevancia de la relevancia/Relevance feedback

info:eu-repo/classification/Representación multimodal/Multimodal representation

info:eu-repo/classification/cti/1

info:eu-repo/classification/cti/12

info:eu-repo/classification/cti/1203

Mostrar el registro completo del ítem