13 Zentrale Universitätseinrichtungen

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/14

Browse

Search Results

Now showing 1 - 10 of 11
  • Thumbnail Image
    ItemOpen Access
    Performance comparison of CFD microbenchmarks on diverse HPC architectures
    (2024) Galeazzo, Flavio C. C.; Garcia-Gasulla, Marta; Boella, Elisabetta; Pocurull, Josep; Lesnik, Sergey; Rusche, Henrik; Bnà, Simone; Cerminara, Matteo; Brogi, Federico; Marchetti, Filippo; Gregori, Daniele; Weiß, R. Gregor; Ruopp, Andreas
    OpenFOAM is a CFD software widely used in both industry and academia. The exaFOAM project aims at enhancing the HPC scalability of OpenFOAM, while identifying its current bottlenecks and proposing ways to overcome them. For the assessment of the software components and the code profiling during the code development, lightweight but significant benchmarks should be used. The answer was to develop microbenchmarks, with a small memory footprint and short runtime. The name microbenchmark does not mean that they have been prepared to be the smallest possible test cases, as they have been developed to fit in a compute node, which usually has dozens of compute cores. The microbenchmarks cover a broad band of applications: incompressible and compressible flow, combustion, viscoelastic flow and adjoint optimization. All benchmarks are part of the OpenFOAM HPC Technical Committee repository and are fully accessible. The performance using HPC systems with Intel and AMD processors (x86_64 architecture) and Arm processors (aarch64 architecture) have been benchmarked. For the workloads in this study, the mean performance with the AMD CPU is 62% higher than with Arm and 42% higher than with Intel. The AMD processor seems particularly suited resulting in an overall shorter time-to-solution.
  • Thumbnail Image
    ItemOpen Access
    Improving the MPI-IO performance of applications with genetic algorithm based auto-tuning
    (2021) Bagbaba, Ayse; Wang, Xuan
    Parallel I/O is an essential part of scientific applications running on high-performance computing systems. Under- standing an application’s parallel I/O behavior and identifying sources of performance bottlenecks require a multi-layer view of the I/O. Typical parallel I/O stack layers offer many tunable parameters that can achieve the best possible I/O performance. However, scientific users do often not have the time nor the experience for investigating the proper combination of these parameters for each application use-case. Auto-tuning can help users by automatically tuning I/O parameters at various layers transparently. In auto-tuning, using naive strategy, running an application by trying all possible combinations of tunable parameters for all layers of the I/O stack to find the best settings is an exhaustive search through the huge parameter space. This strategy is infeasible because of the long execution times of trial runs. In this paper, we propose a genetic algorithm-based parallel I/O auto-tuning approach that can hide the complexity of the I/O stack from users and auto-tune a set of parameter values for an application on a given system to improve the I/O performance. In particular, our approach tests a set of parameters and then, modifies the combination of these parameters for further testing based on the I/O performance. We have validated our model using two I/O benchmarks, namely IOR and MPI-Tile-IO. We achieved an increase in I/O bandwidth of up to 7.74×over the default parameters for IOR and 5.59× over the default parameters for MPI-Tile-IO.
  • Thumbnail Image
    ItemOpen Access
    Urban digital twins for smart cities and citizens : the case study of Herrenberg, Germany
    (2020) Dembski, Fabian; Wössner, Uwe; Letzgus, Mike; Ruddat, Michael; Yamu, Claudia
    Cities are complex systems connected to economic, ecological, and demographic conditions and change. They are also characterized by diverging perceptions and interests of citizens and stakeholders. Thus, in the arena of urban planning, we are in need of approaches that are able to cope not only with urban complexity but also allow for participatory and collaborative processes to empower citizens. This to create democratic cities. Connected to the field of smart cities and citizens, we present in this paper, the prototype of an urban digital twin for the 30,000-people town of Herrenberg in Germany. Urban digital twins are sophisticated data models allowing for collaborative processes. The herein presented prototype comprises (1) a 3D model of the built environment, (2) a street network model using the theory and method of space syntax, (3) an urban mobility simulation, (4) a wind flow simulation, and (5) a number of empirical quantitative and qualitative data using volunteered geographic information (VGI). In addition, the urban digital twin was implemented in a visualization platform for virtual reality and was presented to the general public during diverse public participatory processes, as well as in the framework of the “Morgenstadt Werkstatt” (Tomorrow’s Cities Workshop). The results of a survey indicated that this method and technology could significantly aid in participatory and collaborative processes. Further understanding of how urban digital twins support urban planners, urban designers, and the general public as a collaboration and communication tool and for decision support allows us to be more intentional when creating smart cities and sustainable cities with the help of digital twins. We conclude the paper with a discussion of the presented results and further research directions.
  • Thumbnail Image
    ItemOpen Access
    Soya yield prediction on a within-field scale using machine learning models trained on Sentinel-2 and soil data
    (2022) Pejak, Branislav; Lugonja, Predrag; Antić, Aleksandar; Panić, Marko; Pandžić, Miloš; Alexakis, Emmanouil; Mavrepis, Philip; Zhou, Naweiluo; Marko, Oskar; Crnojević, Vladimir
    Agriculture is the backbone and the main sector of the industry for many countries in the world. Assessing crop yields is key to optimising on-field decisions and defining sustainable agricultural strategies. Remote sensing applications have greatly enhanced our ability to monitor and manage farming operation. The main objective of this research was to evaluate machine learning system for within-field soya yield prediction trained on Sentinel-2 multispectral images and soil parameters. Multispectral images used in the study came from ESA’s Sentinel-2 satellites. A total of 3 cloud-free Sentinel-2 multispectral images per year from specific periods of vegetation were used to obtain the time-series necessary for crop yield prediction. Yield monitor data were collected in three crop seasons (2018, 2019 and 2020) from a number of farms located in Upper Austria. The ground-truth database consisted of information about the location of the fields and crop yield monitor data on 411 ha of farmland. A novel method, namely the Polygon-Pixel Interpolation, for optimal fitting yield monitor data with satellite images is introduced. Several machine learning algorithms, such as Multiple Linear Regression, Support Vector Machine, eXtreme Gradient Boosting, Stochastic Gradient Descent and Random Forest, were compared for their performance in soya yield prediction. Among the tested machine learning algorithms, Stochastic Gradient Descent regression model performed better than the others, with a mean absolute error of 4.36 kg/pixel (0.436 t/ha) and a correlation coefficient of 0.83%.
  • Thumbnail Image
    ItemOpen Access
    Improving collective I/O performance with machine learning supported auto-tuning
    (2020) Bagbaba, Ayse
    Collective Input and output (I/O) is an essential approach in high performance computing (HPC) applications. The achievement of effective collective I/O is a nontrivial job due to the complex interdependencies between the layers of I/O stack. These layers provide the best possible I/O performance through a number of tunable parameters. Sadly, the correct combination of parameters depends on diverse applications and HPC platforms. When a configuration space gets larger, it becomes difficult for humans to monitor the interactions between the configuration options. Engineers has no time or experience for exploring good configuration parameters for each problem because of long benchmarking phase. In most cases, the default settings are implemented, often leading to poor I/O efficiency. I/O profiling tools can not tell the optimal default setups without too much effort to analyzing the tracing results. In this case, an auto-tuning solution for optimizing collective I/O requests and providing system administrators or engineers the statistic information is strongly required. In this paper, a study of the machine learning supported collective I/O auto-tuning including the architecture and software stack is performed. Random forest regression model is used to develop a performance predictor model that can capture parallel I/O behavior as a function of application and file system characteristics. The modeling approach can provide insights into the metrics that impact I/O performance significantly.
  • Thumbnail Image
    ItemOpen Access
    Lustre I/O performance investigations on Hazel Hen : experiments and heuristics
    (2021) Seiz, Marco; Offenhäuser, Philipp; Andersson, Stefan; Hötzer, Johannes; Hierl, Henrik; Nestler, Britta; Resch, Michael
    With ever-increasing computational power, larger computational domains are employed and thus the data output grows as well. Writing this data to disk can become a significant part of runtime if done serially. Even if the output is done in parallel, e.g., via MPI I/O, there are many user-space parameters for tuning the performance. This paper focuses on the available parameters for the Lustre file system and the Cray MPICH implementation of MPI I/O. Experiments on the Cray XC40 Hazel Hen using a Cray Sonexion 2000 Lustre file system were conducted. In the experiments, the core count, the block size and the striping configuration were varied. Based on these parameters, heuristics for striping configuration in terms of core count and block size were determined, yielding up to a 32-fold improvement in write rate compared to the default. This corresponds to 85 GB/s of the peak bandwidth of 202.5 GB/s. The heuristics are shown to be applicable to a small test program as well as a complex application.
  • Thumbnail Image
    ItemOpen Access
    An innovative technological infrastructure for managing SARS-CoV-2 data across different cohorts in compliance with General Data Protection Regulation
    (2024) Dellacasa, Chiara; Ortali, Maurizio; Rossi, Elisa; Abu Attieh, Hammam; Osmo, Thomas; Puskaric, Miroslav; Rinaldi, Eugenia; Prasser, Fabian; Stellmach, Caroline; Cataudella, Salvatore; Agarwal, Bhaskar; Mata Naranjo, Juan; Scipione, Gabriella
    Background: The ORCHESTRA project, funded by the European Commission, aims to create a pan-European cohort built on existing and new large-scale population cohorts to help rapidly advance the knowledge related to the prevention of the SARS-CoV-2 infection and the management of COVID-19 and its long-term sequelae. The integration and analysis of the very heterogeneous health data pose the challenge of building an innovative technological infrastructure as the foundation of a dedicated framework for data management that should address the regulatory requirements such as the General Data Protection Regulation (GDPR). Methods: The three participating Supercomputing European Centres (CINECA - Italy, CINES - France and HLRS - Germany) designed and deployed a dedicated infrastructure to fulfil the functional requirements for data management to ensure sensitive biomedical data confidentiality/privacy, integrity, and security. Besides the technological issues, many methodological aspects have been considered: Berlin Institute of Health (BIH), Charité provided its expertise both for data protection, information security, and data harmonisation/standardisation. Results: The resulting infrastructure is based on a multi-layer approach that integrates several security measures to ensure data protection. A centralised Data Collection Platform has been established in the Italian National Hub while, for the use cases in which data sharing is not possible due to privacy restrictions, a distributed approach for Federated Analysis has been considered. A Data Portal is available as a centralised point of access for non-sensitive data and results, according to findability, accessibility, interoperability, and reusability (FAIR) data principles. This technological infrastructure has been used to support significative data exchange between population cohorts and to publish important scientific results related to SARS-CoV-2. Conclusions: Considering the increasing demand for data usage in accordance with the requirements of the GDPR regulations, the experience gained in the project and the infrastructure released for the ORCHESTRA project can act as a model to manage future public health threats. Other projects could benefit from the results achieved by ORCHESTRA by building upon the available standardisation of variables, design of the architecture, and process used for GDPR compliance.
  • Thumbnail Image
    ItemOpen Access
    Container orchestration on HPC systems through Kubernetes
    (2021) Zhou, Naweiluo; Georgiou, Yiannis; Pospieszny, Marcin; Zhong, Li; Zhou, Huan; Niethammer, Christoph; Pejak, Branislav; Marko, Oskar; Hoppe, Dennis
    Containerisation demonstrates its efficiency in application deployment in Cloud Computing. Containers can encapsulate complex programs with their dependencies in isolated environments making applications more portable, hence are being adopted in High Performance Computing (HPC) clusters. Singularity, initially designed for HPC systems, has become their de facto standard container runtime. Nevertheless, conventional HPC workload managers lack micro-service support and deeply-integrated container management, as opposed to container orchestrators. We introduce a Torque-Operator which serves as a bridge between HPC workload manager (TORQUE) and container orchestrator (Kubernetes). We propose a hybrid architecture that integrates HPC and Cloud clusters seamlessly with little interference to HPC systems where container orchestration is performed on two levels.
  • Thumbnail Image
    ItemOpen Access
    Particle-resolved simulation of the pyrolysis process of a single plastic particle
    (2024) Zhang, Feichi; Tavakkol, Salar; Galeazzo, Flavio C. C.; Stapf, Dieter
    Particle-resolved simulations have been performed to study the pyrolysis process of a high-density polyethylene (HDPE) particle in an inert hot nitrogen flow. The simulations resolve the velocity and temperature boundary layers around the particle, as well as the gradients of temperature and concentration within the particle. The objective of this work is to gain an in-depth understanding of the effect of particle morphology-specifically, the particle size and shape-on the interplay between heat transfer and pyrolysis progress, as well as to assess the applicable particle size when using the Lagrangian concept for simulating plastic pyrolysis. In all simulation cases, the pyrolysis reaction is initiated at the external surface of the particle, where the particle is heated the fastest. The reaction front propagates inward toward the core of the particle until it is fully pyrolyzed. For particle diameters larger than 4 mm, distinct temperature gradients within the particle can be detected, leading to a temperature difference of more than 10 K between the core and the external surface of the plastic particle. In this case, the Lagrangian simulations yield a considerably slower conversion compared with the particle-resolved simulations. Moreover, the cylindrical particle in longitudinal flow has been found to be pyrolyzed more slowly compared with the spherical and shell-shaped particles, which is attributed to the enhanced heat transfer conditions for the cylindrical particle. The results reveal the importance of considering particle morphology when modeling plastic pyrolysis. In addition, the Lagrangian approach, which assumes particle homogeneity, is only applicable for particle diameters smaller than 2 mm when modeling plastic pyrolysis.
  • Thumbnail Image
    ItemOpen Access
    Coherent mesh representation for parallel I/O of unstructured polyhedral meshes
    (2024) Weiß, R. Gregor; Lesnik, Sergey; Galeazzo, Flavio C. C.; Ruopp, Andreas; Rusche, Henrik
    This paper presents a new mesh data layout for parallel I/O of linear unstructured polyhedral meshes. The new mesh representation infers coherence across entities of different topological dimensions, i.e., grid cells, faces, and points. The coherence due to cell-to-face and face-to-point connectivities of the mesh is formulated as a tree data structure distributed across processors. The mesh distribution across processors creates consecutive and contiguous slices that render an optimized data access pattern for parallel I/O. A file format using the coherent mesh representation, developed and tested with OpenFOAM, enables the usability of the software at unprecedented scales. Further implications of the coherent and sliceable mesh representation arise due to simplifications in partitioning and diminished pre- and post-processing overheads.