Articulo
Hypergeometric language model and zipf-like scoring function for web document similarity retrieval
Fecha
2010Registro en:
0302-9743
D08I1015
WOS:000288886400032
WOS:000288886400032
0
Institución
Resumen
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine users. In this work, a new method is proposed for Web similarity document retrieval based on generative language models and meta search engines. Probabilistic language models are used as a random query generator for the given document. Queries are submitted to a customizable set of Web search engines. Once all results obtained are gathered, its evaluation is determined by a proposed scoring function based on the Zipf law. Results obtained showed that the proposed methodology for query generation and scoring procedure solves the problem with acceptable levels of precision.