Similarity search with set intersection as a distance measure

Hoffmann, Benjamin Sascha

Similarity search with set intersection as a distance measure

dc.contributor.advisor	Diekert, Volker (Prof. Dr.)	de
dc.contributor.author	Hoffmann, Benjamin Sascha	de
dc.date.accessioned	2010-06-09	de
dc.date.accessioned	2016-03-31T07:58:56Z
dc.date.available	2010-06-09	de
dc.date.available	2016-03-31T07:58:56Z
dc.date.issued	2010	de
dc.date.updated	2015-06-02	de
dc.description.abstract	This thesis deals with a fundamental algorithmic problem. Given a database of sets and a query set, we want to determine a set from the database that has a maximal intersection with the query set. It is allowed to preprocess the database so that queries can be answered efficiently. We solve the approximate version of this problem. We investigate two randomized input models which are derived from real inputs. We present a deterministic algorithm for each of them. Under the assumption that the database and the query set follow one of these models, the corresponding algorithm determines with high probability a set from the database that has no maximal intersection with the query set, but an intersection that achieves a large proportion of the maximal size. Depending on the model, the query time is either quasi-linear in the sum of the database size and the number of different elements from all sets, or it is polylogarithmic in the database size. Thus, both algorithms are significantly faster than a naive algorithm intersecting the query set with each single database set.	en
dc.description.abstract	Die vorliegende Arbeit beschäftigt sich mit einem elementaren Problem aus dem Gebiet der Algorithmentheorie. Gegeben sei eine Datenbank von Mengen und eine Anfragemenge. Das Ziel ist, möglichst effizient eine Menge der Datenbank zu bestimmen, die einen Schnitt maximaler Größe mit der Anfragemenge besitzt. Dabei ist es erlaubt, die Datenbank vorzuverarbeiten. Wir präsentieren Lösungen für die Approximationsvariante dieses Problems. Wir untersuchen zwei aus der Praxis hergeleitete Eingabemodelle und stellen für jedes Modell einen deterministischen Algorithmus vor. Verhalten sich die Datenbank und die Anfragemenge gemäß einem dieser Modelle, dann bestimmt der entsprechende Algorithmus mit hoher Wahrscheinlichkeit eine Menge der Datenbank, deren Schnittgröße mit der Anfragemenge zwar nicht maximal ist, jedoch einen hohen Anteil der maximalen Größe erreicht. Die Anfragezeit ist je nach Modell entweder quasilinear in der Summe der Datenbankgröße und der Anzahl der verschiedenen Elemente aller Mengen, oder polylogarithmisch in der Datenbankgröße. Somit sind beide Algorithmen deutlich schneller als ein naiver Algorithmus, der die Größe des Schnittes zwischen der Anfragemenge und jeder einzelnen Menge der Datenbank bestimmt.	de
dc.identifier.other	324096585	de
dc.identifier.uri	http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-53244	de
dc.identifier.uri	http://elib.uni-stuttgart.de/handle/11682/2686
dc.identifier.uri	http://dx.doi.org/10.18419/opus-2669
dc.language.iso	en	de
dc.rights	info:eu-repo/semantics/openAccess	de
dc.subject.classification	Approximationsalgorithmus , Nächste-Nachbarn-Problem , Zipfsches Gesetz	de
dc.subject.ddc	004	de
dc.subject.other	Maximaler-Schnitt-Problem , Randomisierte Eingabemodelle	de
dc.subject.other	approximation algorithms , nearest neighbor search , maximal intersection problem , randomized input models , Zipf's law	en
dc.title	Similarity search with set intersection as a distance measure	en
dc.title.alternative	Ähnlichkeitssuche mit der Schnittmengengröße als Distanzmaß	de
dc.type	doctoralThesis	de
ubs.dateAccepted	2010-03-25	de
ubs.fakultaet	Fakultät Informatik, Elektrotechnik und Informationstechnik	de
ubs.institut	Institut für Formale Methoden der Informatik	de
ubs.opusid	5324	de
ubs.publikation.typ	Dissertation	de
ubs.thesis.grantor	Fakultät Informatik, Elektrotechnik und Informationstechnik	de

Files

Original bundle

Now showing 1 - 1 of 1

Name:: DissHoff.pdf
Size:: 612.59 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 935 B
Format:: Plain Text
Description:

Download

Collections

05 Fakultät Informatik, Elektrotechnik und Informationstechnik