13 Zentrale Universitätseinrichtungen

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/14

Browse

Search Results

Now showing 1 - 6 of 6
  • Thumbnail Image
    ItemOpen Access
    Improving the MPI-IO performance of applications with genetic algorithm based auto-tuning
    (2021) Bagbaba, Ayse; Wang, Xuan
    Parallel I/O is an essential part of scientific applications running on high-performance computing systems. Under- standing an application’s parallel I/O behavior and identifying sources of performance bottlenecks require a multi-layer view of the I/O. Typical parallel I/O stack layers offer many tunable parameters that can achieve the best possible I/O performance. However, scientific users do often not have the time nor the experience for investigating the proper combination of these parameters for each application use-case. Auto-tuning can help users by automatically tuning I/O parameters at various layers transparently. In auto-tuning, using naive strategy, running an application by trying all possible combinations of tunable parameters for all layers of the I/O stack to find the best settings is an exhaustive search through the huge parameter space. This strategy is infeasible because of the long execution times of trial runs. In this paper, we propose a genetic algorithm-based parallel I/O auto-tuning approach that can hide the complexity of the I/O stack from users and auto-tune a set of parameter values for an application on a given system to improve the I/O performance. In particular, our approach tests a set of parameters and then, modifies the combination of these parameters for further testing based on the I/O performance. We have validated our model using two I/O benchmarks, namely IOR and MPI-Tile-IO. We achieved an increase in I/O bandwidth of up to 7.74×over the default parameters for IOR and 5.59× over the default parameters for MPI-Tile-IO.
  • Thumbnail Image
    ItemOpen Access
    Improving collective I/O performance with machine learning supported auto-tuning
    (2020) Bagbaba, Ayse
    Collective Input and output (I/O) is an essential approach in high performance computing (HPC) applications. The achievement of effective collective I/O is a nontrivial job due to the complex interdependencies between the layers of I/O stack. These layers provide the best possible I/O performance through a number of tunable parameters. Sadly, the correct combination of parameters depends on diverse applications and HPC platforms. When a configuration space gets larger, it becomes difficult for humans to monitor the interactions between the configuration options. Engineers has no time or experience for exploring good configuration parameters for each problem because of long benchmarking phase. In most cases, the default settings are implemented, often leading to poor I/O efficiency. I/O profiling tools can not tell the optimal default setups without too much effort to analyzing the tracing results. In this case, an auto-tuning solution for optimizing collective I/O requests and providing system administrators or engineers the statistic information is strongly required. In this paper, a study of the machine learning supported collective I/O auto-tuning including the architecture and software stack is performed. Random forest regression model is used to develop a performance predictor model that can capture parallel I/O behavior as a function of application and file system characteristics. The modeling approach can provide insights into the metrics that impact I/O performance significantly.
  • Thumbnail Image
    ItemOpen Access
    Lustre I/O performance investigations on Hazel Hen : experiments and heuristics
    (2021) Seiz, Marco; Offenhäuser, Philipp; Andersson, Stefan; Hötzer, Johannes; Hierl, Henrik; Nestler, Britta; Resch, Michael
    With ever-increasing computational power, larger computational domains are employed and thus the data output grows as well. Writing this data to disk can become a significant part of runtime if done serially. Even if the output is done in parallel, e.g., via MPI I/O, there are many user-space parameters for tuning the performance. This paper focuses on the available parameters for the Lustre file system and the Cray MPICH implementation of MPI I/O. Experiments on the Cray XC40 Hazel Hen using a Cray Sonexion 2000 Lustre file system were conducted. In the experiments, the core count, the block size and the striping configuration were varied. Based on these parameters, heuristics for striping configuration in terms of core count and block size were determined, yielding up to a 32-fold improvement in write rate compared to the default. This corresponds to 85 GB/s of the peak bandwidth of 202.5 GB/s. The heuristics are shown to be applicable to a small test program as well as a complex application.
  • Thumbnail Image
    ItemOpen Access
    Container orchestration on HPC systems through Kubernetes
    (2021) Zhou, Naweiluo; Georgiou, Yiannis; Pospieszny, Marcin; Zhong, Li; Zhou, Huan; Niethammer, Christoph; Pejak, Branislav; Marko, Oskar; Hoppe, Dennis
    Containerisation demonstrates its efficiency in application deployment in Cloud Computing. Containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable, hence are being adopted in High Performance Computing (HPC) clusters. Singularity, initially designed for HPC systems, has become their de facto standard container runtime. Nevertheless, conventional HPC workload managers lack micro-service support and deeply-integrated container management, as opposed to container orchestrators. We introduce a Torque-Operator which serves as a bridge between HPC workload manager (TORQUE) and container orchestrator (Kubernetes). We propose a hybrid architecture that integrates HPC and Cloud clusters seamlessly with little interference to HPC systems where container orchestration is performed on two levels.
  • Thumbnail Image
    ItemOpen Access
    Fourth-order paired-explicit Runge-Kutta methods
    (2025) Doehring, Daniel; Christmann, Lars; Schlottke-Lakemper, Michael; Gassner, Gregor; Torrilhon, Manuel
    In this paper, we extend the Paired-Explicit Runge-Kutta (P-ERK) schemes by Vermeire et al. (J Comput Phys 393:465-483, 2019) and Nasab and Vermeire (J Comput Phys 468:111470, 2022) to fourth-order of consistency. Based on the order conditions for partitioned Runge-Kutta methods we motivate a specific form of the Butcher arrays which leads to a family of fourth-order accurate methods. The employed form of the Butcher arrays results in a special structure of the stability polynomials, which needs to be adhered to for an efficient optimization of the domain of absolute stability. We demonstrate that the constructed fourth-order P-ERK methods satisfy linear stability, internal consistency, designed order of convergence, and conservation of linear invariants. At the same time, these schemes are seamlessly coupled for codes employing a method-of-lines approach, in particular without any modifications of the spatial discretization. We demonstrate speedup for single-threaded program executions, shared-memory parallelism, i.e., multi-threaded executions and distributed-memory parallelism with MPI. We apply the multirate P-ERK schemes to inviscid and viscous problems with locally varying wave speeds, which may be induced by non-uniform grids or multiscale properties of the governing partial differential equation. Compared to state-of-the-art optimized standalone methods, the multirate P-ERK schemes allow significant reductions in right-hand-side evaluations and wall-clock time, ranging from up to factors greater than four. A reproducibility repository is provided which enables the reader to examine all results presented in this work.
  • Thumbnail Image
    ItemOpen Access
    Governance of high-risk AI systems in healthcare and credit scoring
    (2025) Bartsch, Sebastian; Behn, Oliver; Benlian, Alexander; Brownsword, Roger; Bücker, Sebastian; Düwell, Marcus; Formánek, Nico; Jungtäubl, Marc; Leyer, Michael; Richter, Alexander; Schmidt, Jan-Hendrik; Will-Zocholl, Mascha