Efficient exploratory clustering analyses in large-scale exploration processes

dc.contributor.authorFritz, Manuel
dc.contributor.authorBehringer, Michael
dc.contributor.authorTschechlov, Dennis
dc.contributor.authorSchwarz, Holger
dc.date.accessioned2023-04-14T12:48:12Z
dc.date.available2023-04-14T12:48:12Z
dc.date.issued2021de
dc.date.updated2023-03-24T18:37:03Z
dc.description.abstractClustering is a fundamental primitive in manifold applications. In order to achieve valuable results in exploratory clustering analyses, parameters of the clustering algorithm have to be set appropriately, which is a tremendous pitfall. We observe multiple challenges for large-scale exploration processes. On the one hand, they require specific methods to efficiently explore large parameter search spaces. On the other hand, they often exhibit large runtimes, in particular when large datasets are analyzed using clustering algorithms with super-polynomial runtimes, which repeatedly need to be executed within exploratory clustering analyses. We address these challenges as follows: First, we present LOG-Means and show that it provides estimates for the number of clusters in sublinear time regarding the defined search space, i.e., provably requiring less executions of a clustering algorithm than existing methods. Second, we demonstrate how to exploit fundamental characteristics of exploratory clustering analyses in order to significantly accelerate the (repetitive) execution of clustering algorithms on large datasets. Third, we show how these challenges can be tackled at the same time. To the best of our knowledge, this is the first work which simultaneously addresses the above-mentioned challenges. In our comprehensive evaluation, we unveil that our proposed methods significantly outperform state-of-the-art methods, thus especially supporting novice analysts for exploratory clustering analyses in large-scale exploration processes.en
dc.description.sponsorshipMinisterium für Wissenschaft, Forschung und Kunst Baden-Württembergde
dc.description.sponsorshipBundesministerium für Bildung und Forschungde
dc.description.sponsorshipProjekt DEALde
dc.identifier.issn1066-8888
dc.identifier.issn0949-877X
dc.identifier.other184349051X
dc.identifier.urihttp://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-129642de
dc.identifier.urihttp://elib.uni-stuttgart.de/handle/11682/12964
dc.identifier.urihttp://dx.doi.org/10.18419/opus-12945
dc.language.isoende
dc.relation.uridoi:10.1007/s00778-021-00716-yde
dc.rightsinfo:eu-repo/semantics/openAccessde
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/de
dc.subject.ddc004de
dc.titleEfficient exploratory clustering analyses in large-scale exploration processesen
dc.typearticlede
ubs.fakultaetInformatik, Elektrotechnik und Informationstechnikde
ubs.institutInstitut für Parallele und Verteilte Systemede
ubs.publikation.seiten711-732de
ubs.publikation.sourceThe VLDB journal 31 (2022), S. 711-732de
ubs.publikation.typZeitschriftenartikelde

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
s00778-021-00716-y.pdf
Size:
1.06 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.39 KB
Format:
Item-specific license agreed upon to submission
Description: