Comparison of standard and Zipf-based document retrieval heuristics

Hoffmann, Benjamin

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-2685

Langanzeige der Metadaten

DC Element	Wert	Sprache
dc.contributor.author	Hoffmann, Benjamin	de
dc.date.accessioned	2010-11-05	de
dc.date.accessioned	2016-03-31T07:58:59Z	-
dc.date.available	2010-11-05	de
dc.date.available	2016-03-31T07:58:59Z	-
dc.date.issued	2010	de
dc.identifier.other	381697169	de
dc.identifier.uri	http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-57811	de
dc.identifier.uri	http://elib.uni-stuttgart.de/handle/11682/2702	-
dc.identifier.uri	http://dx.doi.org/10.18419/opus-2685	-
dc.description.abstract	Document retrieval is the task to retrieve from a possibly huge collection of documents those which are most similar to a given query document. In this paper, we present a new heuristic for inexact top K retrieval. It is similar to the well-known index elimination heuristic and is based on Zipf's law, a statistical law observable in natural language texts. We compare the two heuristics with regard to retrieval performance and execution time. Therefore, we use a text collection consisting of scientific articles from various computer science conferences and journals. It turns out that our new approach is not better than index elimination. Interestingly, a combination of both heuristics yields the best results.	en
dc.language.iso	en	de
dc.relation.ispartofseries	Technischer Bericht / Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik;2010,6	de
dc.rights	info:eu-repo/semantics/openAccess	de
dc.subject.classification	Information Retrieval , Heuristik , Zipfsches Gesetz	de
dc.subject.ddc	004	de
dc.title	Comparison of standard and Zipf-based document retrieval heuristics	en
dc.type	workingPaper	de
dc.date.updated	2011-09-05	de
ubs.fakultaet	Fakultät Informatik, Elektrotechnik und Informationstechnik	de
ubs.institut	Institut für Formale Methoden der Informatik	de
ubs.opusid	5781	de
ubs.publikation.typ	Arbeitspapier	de
ubs.schriftenreihe.name	Technischer Bericht / Universität Stuttgart, Fakultät Informatik, Elektrotechnik und Informationstechnik	de
Enthalten in den Sammlungen:	05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
TR_2010_06.pdf		185,84 kB	Adobe PDF	Öffnen/Anzeigen

Zur Kurzanzeige

Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.

Universität Stuttgart

OPUS - Online Publikationen der Universität Stuttgart