Visual words dictionaries and fusion techniques for searching people through textual and visual attributes

Fabian, J; Pires, R; Rocha, A

Artículos de revistas

Registro en:

Pattern Recognition Letters. Elsevier Science Bv, v. 39, n. 74, n. 84, 2014.

0167-8655

1872-7344

WOS:000331854700009

10.1016/j.patrec.2013.09.011

http://www.repositorio.unicamp.br/jspui/handle/REPOSIP/76579

http://repositorio.unicamp.br/jspui/handle/REPOSIP/76579

http://repositorioslatinoamericanos.uchile.cl/handle/2250/1290747

Autor

Fabian, J

Pires, R

Rocha, A

Institución

Universidade Estadual de Campinas (Brasil)

Resumen

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Using personal traits for searching people is paramount in several application areas and has attracted an ever-growing attention from the scientific community over the past years. Some practical applications in the realm of digital forensics and surveillance include locating a suspect or finding missing people in a public space. In this paper, we aim at assigning describable visual attributes (e.g., white chubby male wearing glasses and with bangs) as labels to images to describe their appearance and performing visual searches without relying on image annotations during testing. For that, we create mid-level image representations for face images based on visual dictionaries linking visual properties in the images to describable attributes. In addition, we take advantage of machine learning techniques for combining different attributes and performing a query. First, we propose three methods for building the visual dictionaries. Method #1 uses a sparse-sampling scheme to obtain low-level features with a clustering algorithm to build the visual dictionaries. Method #2 uses dense-sampling to obtain low-level features and random selection to build the visual dictionaries while Method #3 uses dense-sampling to obtain low-level features followed by a clustering algorithm to build the visual dictionaries. Thereafter, we train 2-class classifiers for the describable visual attributes of interest which assign to each image a decision score used to obtain its ranking. For more complex queries (2+ attributes), we use three state-of-the-art approaches for combining the rankings: (1) product of probabilities, (2) rank aggregation and (3) rank position. To date, we have considered fifteen attribute classifiers and, consequently, their direct counterparts theoretically allowing 2(15) = 32,768 different combined queries (the actual number is smaller since some attributes are contradictory or mutually exclusive). Notwithstanding, the method is easily extensible to include new attributes. Experimental results show that Method #3 greatly improves retrieval precision for some attributes in comparison with other methods in the literature. Finally, for combined attributes, product of probabilities, rank aggregation and rank position yield complementary results for rank fusion and the final decision making suggesting interesting possible combinations for further work. (C) 2013 Elsevier B.V. All rights reserved.

Microsoft Research

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

FAPESP [2010/05647-4]

CNPq [304352/2012-8]