Combine-and-conquer: improving the diversity in similarity search through influence sampling

Santos, Lúcio Fernandes Dutra; Oliveira, Willian Dener de; Carvalho, Luiz Olmes; Ferreira, Mônica Ribeiro Porto; Traina, Agma Juci Machado; Traina Junior, Caetano

dc.creator	Santos, Lúcio Fernandes Dutra
dc.creator	Oliveira, Willian Dener de
dc.creator	Carvalho, Luiz Olmes
dc.creator	Ferreira, Mônica Ribeiro Porto
dc.creator	Traina, Agma Juci Machado
dc.creator	Traina Junior, Caetano
dc.date.accessioned	2015-06-30T13:56:18Z
dc.date.accessioned	2018-07-04T17:05:50Z
dc.date.available	2015-06-30T13:56:18Z
dc.date.available	2018-07-04T17:05:50Z
dc.date.created	2015-06-30T13:56:18Z
dc.date.issued	2015-04
dc.identifier	Symposium on Applied Computing, 30th, 2015, Salamanca.
dc.identifier	9781450331968
dc.identifier	http://www.producao.usp.br/handle/BDPI/49015
dc.identifier	http://dx.doi.org/10.1145/2695664.2695798
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1644611
dc.description.abstract	Result diversification methods are intended to retrieve elements similar to a given object whereas also enforcing a certain degree of diversity among them, aimed at improving the answer relevance. Most of the methods are based on optimization, but bearing NP-hard solutions. Diversity is injected into an otherwise all-too-similar result set in two phases: in the first, the search space is reduced to speed up finding the optimal solution, whereas in the second a trade-off between diversity and similarity over the reduced space is obtained. It is assumed that the first phase is achieved by applying a traditional nearest neighbor algorithm, but no previous investigation evaluated the impact of the first over the second phase. In this paper, we devised alternative techniques to execute the first phase and evaluated how obtaining a better quality set of elements in the first phase can improve the diversity. Besides the traditional nearest neighbor-based pre-selection, we also considered naive random selection, cluster-based and influence-based ones. Thereafter, extensive experiments evaluated a number of state-of-the-art diversity algorithms employed in the second phase, regarding both processing time and answer quality. The obtained results have shown that although the much more elaborated (and much more time consuming) methods indeed provide best answers, other alternatives are able to provide a better commitment regarding quality and performance. Moreover, the pre-selection techniques can reduce the total running time by up to two orders of magnitude.
dc.language	eng
dc.publisher	Association for Computing Machinery - ACM
dc.publisher	University of Salamanca
dc.publisher	Salamanca
dc.relation	Symposium on Applied Computing, 30th
dc.rights	Copyright ACM
dc.rights	closedAccess
dc.subject	Search result diversification
dc.subject	similarity search
dc.subject	sampling
dc.title	Combine-and-conquer: improving the diversity in similarity search through influence sampling
dc.type	Actas de congresos

Este ítem pertenece a la siguiente institución

Universidade de São Paulo (Brasil)