Browsing by Author "Fritz, Manuel"

Now showing 1 - 2 of 2

Open Access
Efficient exploratory clustering analyses in large-scale exploration processes
(2021) Fritz, Manuel; Behringer, Michael; Tschechlov, Dennis; Schwarz, Holger
Clustering is a fundamental primitive in manifold applications. In order to achieve valuable results in exploratory clustering analyses, parameters of the clustering algorithm have to be set appropriately, which is a tremendous pitfall. We observe multiple challenges for large-scale exploration processes. On the one hand, they require specific methods to efficiently explore large parameter search spaces. On the other hand, they often exhibit large runtimes, in particular when large datasets are analyzed using clustering algorithms with super-polynomial runtimes, which repeatedly need to be executed within exploratory clustering analyses. We address these challenges as follows: First, we present LOG-Means and show that it provides estimates for the number of clusters in sublinear time regarding the defined search space, i.e., provably requiring less executions of a clustering algorithm than existing methods. Second, we demonstrate how to exploit fundamental characteristics of exploratory clustering analyses in order to significantly accelerate the (repetitive) execution of clustering algorithms on large datasets. Third, we show how these challenges can be tackled at the same time. To the best of our knowledge, this is the first work which simultaneously addresses the above-mentioned challenges. In our comprehensive evaluation, we unveil that our proposed methods significantly outperform state-of-the-art methods, thus especially supporting novice analysts for exploratory clustering analyses in large-scale exploration processes.
Open Access
Methods for enhanced exploratory clustering analyses
(2021) Fritz, Manuel; Schwarz, Holger (PD Dr. rer. nat. habil.)
Nowadays, there are several mature approaches for companies and organizations to collect, store and analyze voluminous data. Especially a thorough data analysis is crucial in order to gain new insights from these data, resulting in detailed knowledge, which can be finally exploited to achieve competitive advantages. This thesis focuses on unsupervised clustering analyses as an important problem in data analysis. Clustering is a fundamental primitive used in manifold application domains, such as computer vision, business purposes, biology, and many others. In order to achieve valuable clustering results, parameters of the clustering algorithm have to be set appropriately, which is a tremendous pitfall for previously unseen datasets. To this end, analysts typically perform an exploratory clustering analysis by repeatedly executing the clustering algorithm with varying parameter values until a valuable clustering result is achieved. However, each single execution of the clustering algorithm is time-consuming on large datasets, hence leading to an infeasible exploration process in a reasonable time frame. This thesis proposes novel methods to enhance the overall exploration process for valuable clustering results. Therefore, we focus on (i) technically inexperienced analysts, who require in-depth support to perform exploratory clustering analyses in the first place, as well as on (ii) novice analysts, who lack domain knowledge and therefore suffer from a particular uncertainty regarding promising parameter values, i.e., requiring more time to achieve valuable clustering results. Related work in this area either focuses on methods to automatically conduct an exploratory clustering analysis, or on methods to accelerate the required runtime of a clustering algorithm. Yet, the interdependency between both aspects, a fundamental aspect in exploratory clustering analysis, is out of scope of related work. The proposed methods in this work address this interdependency by investigating characteristics of the overall exploration process, such as large parameter search spaces as they might be defined by novice analysts or the repetitive execution of a clustering algorithm on large datasets. Therefore, crucial additional benefits, like shorter runtimes for exploratory clustering analyses, are achieved, which significantly extend the current state of related work. In our comprehensive evaluations, we unveil the benefits for exploratory clustering analyses. We show that the proposed methods (i) enable technically inexperienced analysts to perform clustering analyses without detailed knowledge about internals of clustering algorithms, as well as (ii) support novice analysts by achieving tremendous runtime savings of up to several orders of magnitude, while gaining even more valuable clustering results in terms of internal characteristics. Concluding, the novel methods proposed in this thesis provide a crucial support for analysts with varying technical experience and domain knowledge. Therefore, the overall exploration process is enhanced, i.e., a valuable clustering result is achieved in a very reasonable time frame, thus significantly outperforming existing approaches for exploratory clustering analyses.