Repository logoOPUS - Online Publications of University Stuttgart
de / en
Log In
New user? Click here to register.Have you forgotten your password?
Communities & Collections
All of DSpace
  1. Home
  2. Browse by Author

Browsing by Author "Balbach, Daniel"

Filter results by typing the first few letters
Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • Thumbnail Image
    ItemOpen Access
    A framework for optimizing spark configurations
    (2022) Balbach, Daniel
    The rising importance of data in modern life, industry, and society introduces a huge interest in processing them. Data-driven approaches nowadays are ubiquitous. Due to the increasing amount of data, the need to process large amounts of data has led to the development of complex, distributed, and scalable processing frameworks. Such a framework is the Apache Spark framework. It offers a rich set of functionalities like classic SQL analytics, machine learning functionalities, graph processing functionalities, and many more. However, the broad range of functionalities can potentially lead to problems. One of them is that due to the different requirement characteristics of the various Spark applications, the standard configuration of the Spark cluster may not be optimally adapted. A suboptimal configuration can lead to higher execution times or lower cluster throughput. Higher execution times can lead to higher costs in environments where the execution time is directly coupled to the billed costs, like in a cloud environment. Besides the financial aspect, a better-configured Spark application may also better use the provided resources and reduce the execution time, thus increasing the throughput. This work addresses this problem by designing and implementing an optimization framework for optimizing Spark configurations of a given Spark application. The optimization framework is then applied in a case study on two exemplary use cases using a Spark cluster in a Databricks environment of a private cloud to demonstrate its practicability. The results show the framework can optimize Spark configurations in general while causing only minimal effort to the applicant. However, outperforming the standard Spark configuration in the exemplary use cases proves to be challenging, especially due to observed runtime variances in the cloud environment. The distinction between statistical variance and real improvement is complex.
OPUS
  • About OPUS
  • Publish with OPUS
  • Legal information
DSpace
  • Cookie settings
  • Privacy policy
  • Send Feedback
University Stuttgart
  • University Stuttgart
  • University Library Stuttgart