Comparison of standard and Zipf-based document retrieval heuristics

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-2685

Autor(en):	Hoffmann, Benjamin
Titel:	Comparison of standard and Zipf-based document retrieval heuristics
Erscheinungsdatum:	2010
Dokumentart:	Arbeitspapier
Serie/Report Nr.:	Technischer Bericht / Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik;2010,6
URI:	http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-57811 http://elib.uni-stuttgart.de/handle/11682/2702 http://dx.doi.org/10.18419/opus-2685
Zusammenfassung:	Document retrieval is the task to retrieve from a possibly huge collection of documents those which are most similar to a given query document. In this paper, we present a new heuristic for inexact top K retrieval. It is similar to the well-known index elimination heuristic and is based on Zipf's law, a statistical law observable in natural language texts. We compare the two heuristics with regard to retrieval performance and execution time. Therefore, we use a text collection consisting of scientific articles from various computer science conferences and journals. It turns out that our new approach is not better than index elimination. Interestingly, a combination of both heuristics yields the best results.
Enthalten in den Sammlungen:	05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
TR_2010_06.pdf		185,84 kB	Adobe PDF	Öffnen/Anzeigen

Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.

Universität Stuttgart