Universität Stuttgart
Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1
Browse
82 results
Search Results
Item Open Access Scalable traffic engineering heuristics for time-triggered communication in real-time networks(2026) Geppert, Heiko; Rothermel, Kurt (Prof. Dr. rer. nat. Dr. h. c.)Distributed safety-critical cyber-physical systems require real-time behavior. This means they must respond not just quickly, but in time, to new situations considering both, the task processing and network communication time. From a networking perspective, meticulous, time-driven traffic planning performed at the frame level is necessary to guarantee low end-to-end delay bounds and low latency. This involves carefully planning transmission operations along each time-critical frame's network path are carefully planned, including precise timing, to limit or even eliminate interference from cross-traffic and ensure timely delivery. Since modern real-time systems can consist of hundreds or thousands of devices - for example, large manufacturing plants or continental-sized power grids - the traffic planning must be highly scalable. Although there are many traffic planning approaches in the literature, there is a lack of very fast heuristics that can handle very large stream sets and networks quickly. This thesis investigates traffic planning heuristics and optimization techniques, focusing on different aspects of the traffic planning domain. The traffic planning consists of novel methods for conflict-graph-based scheduling and new heuristics for very large instances of traffic planning problem. The optimizations include multicast partitioning, which combines the benefits of multicast and unicast traffic plans, and load-balanced stream placement, which generates traffic plans that can accommodate additional streams joining the system later. We created prototype implementations and analyzed their performance in solving the traffic planning problem. Our traffic plans yielded a higher accumulated network throughput or admitted more streams while maintaining computation times ranging from sub-seconds to minutes, even for extremely large-scale problem instances. The traffic planning methods and optimization techniques presented in this thesis can be applied to modern real-time networking technologies, such as Time-Sensitive Networking and TTEthernet.Item Open Access From irregular parallelism to portable GPU kernels : enabling efficient task-based GPU programming with HPX, Kokkos and CPPuddle to accelerate stellar mergers(2026) Daiß, Gregor; Pflüger, Dirk (Prof. Dr.)Adaptive, tree-based structures are the foundation of many of the most efficient algorithms and applications. Yet, running such applications on supercomputers is challenging, as the application developers have to handle, for example, distributed tree-traversals and load balancing. Distributed, asynchronous many-task runtime systems such as HPX aim to alleviate these challenges by embracing fine-grained tasks, enabling a fine interweaving of communication and computation through their task-graph. However, the fine-grained approach embraced by HPX is the antithesis for implementing efficient GPU kernels, which leverage the large numbers of available parallel work items to scale to all compute units of a GPU and to hide latencies. This is compounded by the synchronization of GPU operations, which necessitates blocking entire CPU threads, nullifying any potential benefits of interleaving CPU tasks with the GPU operations through the HPX task-graph. Hence, efficiently leveraging GPU-accelerated supercomputers with HPX applications can still be challenging, albeit for different reasons. In this work, we reconcile these differences and unlock the computational performance of GPUs for HPX applications. Our contributions can be used in any HPX application. However, to benchmark them, we turn to a specific real-world HPX application, the astrophysics code Octo-Tiger. Octo-Tiger is used for large-scale simulations of stellar mergers and demonstrates both the pitfalls and the potential of leveraging large GPU-accelerated supercomputers for applications that rely on such a task-based approach. We refactored Octo-Tiger's CPU-only code, ported its computational hotspots to GPUs, and ultimately turned it into a GPU-accelerated application, greatly speeding up these hotspots with various new GPU compute kernels. These new GPU kernels achieve speedups of up to $283$x compared to Octo-Tiger's previous implementation. However, they also demonstrate the pitfalls of simply replacing CPU tasks by GPU kernels, as Octo-Tiger's overall runtime initially increased almost eightfold when using them despite these individual speedups within the hotspots. To solve the limiting problems, we employ various techniques. Most notably, we integrated asynchronous GPU operations into the HPX task-graph using a polling approach, dynamic GPU kernel fusion based on specialized executors, and GPU memory pools that are optimized for HPX. With these solutions in place, we translate the individual speedups of the GPU kernels into tangible speedups for the entire application of up to 8.5x. Furthermore, we combine this approach with the performance portability library Kokkos, to be able to target different GPU-accelerated supercomputers. In this work, we particularly demonstrate scalability on the supercomputers Perlmutter and Frontier, distributing Octo-Tiger across over a thousand GPU compute nodes.Item Open Access Data-integrated methods for performance improvement of massively parallel coupled simulations(2022) Totounferoush, Amin; Schulte, Miriam (Prof. Dr.)This thesis presents data-integrated methods to improve the computational performance of partitioned multi-physics simulations, particularly on highly parallel systems. Partitioned methods allow using available single-physic solvers and well-validated numerical methods for multi-physics simulations by decomposing the domain into smaller sub-domains. Each sub-domain is solved by a separate solver and an external library is incorporated to couple the solvers. This significantly reduces the software development cost and enhances flexibility, while it introduces new challenges that must be addressed carefully. These challenges include but are not limited to, efficient data communication between sub-domains, data mapping between not-matching meshes, inter-solver load balancing, and equation coupling. In the current work, inter-solver communication is improved by introducing a two-level communication initialization scheme to the coupling library preCICE. The new method significantly speed-ups the initialization and removes memory bottlenecks of the previous implementation. In addition, a data-driven inter-solver load balancing method is developed to efficiently distribute available computational resources between coupled single-physic solvers. This method employs both regressions and deep neural networks (DNN) for modeling the performance of the solvers and derives and solves an optimization problem to distribute the available CPU and GPU cores among solvers. To accelerate the equation coupling between strongly coupled solvers, a hybrid framework is developed that integrates DNNs and classical solvers. The DNN computes a solution estimation for each time step which is used by classical solvers as a first guess to compute the final solution. To preserve DNN's efficiency during the simulation, a dynamic re-training strategy is introduced that updates the DNN's weights on-the-fly. The cheap but accurate solution estimation by the DNN surrogate solver significantly reduces the number of subsequent classical iterations necessary for solution convergence. Finally, a highly scalable simulation environment is introduced for fluid-structure interaction problems. The environment consists of highly parallel numerical solvers and an efficient and scalable coupling library. This framework is able to efficiently exploit both CPU-only and hybrid CPU-GPU machines. Numerical performance investigations using a complex test case demonstrate a very high parallel efficiency on a large number of CPUs and a significant speed-up due to the GPU acceleration.Item Open Access Surrogate modeling with scientific machine learning(2025) Leiteritz, Raphael; Pflüger, Dirk (Prof. Dr.)Item Open Access Supporting multi-tenancy in Relational Database Management Systems for OLTP-style software as a service applications(2015) Schiller, Oliver; Mitschang, Bernhard (Prof. Dr.-Ing. habil.)The consolidation of multiple tenants onto a single relational database management system (RDBMS) instance, commonly referred to as multi-tenancy, turned out being beneficial since it supports improving the profit margin of the provider and allows lowering service fees, by what the service attracts more tenants. So far, existing solutions create the required multi-tenancy support on top of a traditional RDBMS implementation, i. e., they implement data isolation between tenants, per-tenant customization and further tenant-centric data management features in application logic. This is complex, error-prone and often reimplements efforts the RDBMS already offers. Moreover, this approach disables some optimization opportunities in the RDBMS and represents a conceptual misstep with Separation of Concerns in mind. For the points mentioned, an RDBMS that provides support for the development and operation of a multi-tenant software as a service (SaaS) offering is compelling. In this thesis, we contribute to a multi-tenant RDBMS for OLTP-style SaaS applications by extending a traditional disk-oriented RDBMS architecture with multi-tenancy support. For this purpose, we primarily extend an RDBMS by introducing tenants as first-class database objects and establishing tenant contexts to isolate tenants logically. Using these extensions, we address tenant-aware schema management, for which we present a schema inheritance concept that is tailored to the needs of multi-tenant SaaS applications. Thereafter, we evaluate different storage concepts to store a tenant’s tuples with respect to their scalability. Next, we contribute an architecture of a multi-tenant RDBMS cluster for OLTP-style SaaS applications. At that, we focus on a partitioning solution which is aligned to tenants and allows obtaining independently manageable pieces. To balance load in the proposed cluster architecture, we present a live database migration approach, whose design favors low migration overhead and provides minimal interruption of service.Item Open Access Ansätze für flexible und fehlertolerante modellgetriebene IoT-Anwendungen in dynamischen Umgebungen(2024) Del Gaudio, Daniel; Mitschang, Bernhard (Prof. Dr.-Ing. habil.)Item Open Access Effiziente Gestaltung und Anwendung von attributbasierter Zugriffskontrolle für RESTful Services(2019) Hüffmeyer, Marc; Mitschang, Bernhard (Prof. Dr.)Item Open Access Efficient code offloading techniques for mobile applications(2017) Berg, Florian; Rothermel, Kurt (Prof. Dr. rer. nat. Dr. h. c.)Since the release of the first smart phone from Apple in the year 2007, smart phones in general experience a fast growth of rising popularity. A smart phone typically possesses among others a touchscreen display as user interface, a mobile communication for accessing the Internet, and a System-on-a-Chip as an integrated circuit of required components like a central processing unit. This pervasive computing platform derives its required power from a battery, where an end user runs upon it different kinds of applications like a calendar application or a high-end mobile game. Differing in the usage of the local resources from a battery-operated smart phone, a heavy utilization of local resources like playing a resource-demanding application drains the limited resource of energy in few hours. Despite the constant increase of memory, communication, or processing capabilities of a smart phone since the release in 2007, applications are also getting more and more sophisticated and demanding. As a result, the energy consumed on a smart phone was, still is, and will be its main limiting factor. To prevent the limited resource of energy from a quick exhaustion, researchers propose code offloading for (resource-constrained) mobile devices like smart phones. Code offloading strives for increasing the energy efficiency and execution speed of applications by utilizing a server instance in the infrastructure. To this end, a code offloading approach executes dynamically resource-intensive parts from an application on powerful remote servers in the infrastructure on behalf of a (resource-constrained) mobile device. During the remote execution of a resource-intensive application part on a remote server, a mobile device only waits in idle mode until it receives the result of the application part executed remotely. Instead of executing an application part on its local resources, a (resource-constrained) mobile device benefits from the more powerful resources of a remote server by sending the information required for a remote execution, waiting in idle mode, and receiving the result of the remote execution. The process of offloading code from a (resource-constrained) mobile device to a powerful remote server in the infrastructure, however, faces different problems. For instance, code offloading introduces some overhead for additional computation and communication on a mobile device. Moreover, spontaneous disconnections during a remote execution can cause a higher energy consumption and execution time than a local execution on a mobile device without code offloading. To this end, this dissertation addresses the whole process of offloading code from a mobile device not only to one but also to multiple remote resources, comprising the following steps: 1) First, code offloading has to identify feasible parts from an application for a remote execution, where the distributed execution of the identified application part is more beneficial than its local execution. A feasible part for a remote execution typically has the following properties: A low size of information required for transmission before a remote execution, a resource-intensive computation not accessing local sensors, and a low size of information required for transmission after a remote execution. In the area of identification of application parts for a remote execution, this dissertation presents an approach based on code annotations from application developers that automatically transforms a monolithic execution on a mobile device to a distributed execution on multiple heterogeneous resources. In contrast to related approaches in the literature, the annotation-based approach requires least interventions from application developers and end users, keeping the overhead introduced on a mobile device low. 2) For an application part identified for a remote execution, code offloading has to determine its execution side, executing the application part either on the local resources of a mobile device or on the remote resource at the infrastructure. In the area of determining the execution side for an application part, this dissertation presents the offloading problem, where a mobile device decides whether to execute an application part locally or remotely. Furthermore, this dissertation also presents an approach called "code bubbling" that shifts the decision making into the infrastructure. In contrast to related approaches in the literature, the decision-based approach on a mobile device and the bubbling-based approach minimize the execution time, energy consumption, and monetary cost for an application. 3) To determine the execution side for an application part identified for a remote execution, code offloading has to obtain different parameters from the application, participating resources, and utilized links. In the area of obtaining the information required from an application, this dissertation presents a bit-flipping approach that dynamically flips a bit at the modification of application-related information. Furthermore, this dissertation also presents an offload-aware Application Programming Interface (API) that encapsulates the application-related information required for code offloading. In contrast to related approaches in the literature, the bit-flipping approach and the offload-aware API provide an efficient gathering of information at run-time, keeping the overhead introduced on a mobile device low. 4) Beside the information from an application, code offloading has to obtain further information from participating resources and utilized links. In the area of obtaining the information required from participating resources and utilized links, this dissertation presents the approach of code bubbling, already mentioned above. In contrast to related approaches in the literature, the bubbling-based approach makes the offload decision at the place where the related information occurs, keeping the overhead introduced on a mobile device, participating resources, and utilized links low. 5) In case of a remote execution of an application part, code offloading has to send the information required for a remote execution to the remote resource that subsequently executes the application part on behalf of the mobile device. In the area of sending the required information and executing an application part remotely, this dissertation presents code offloading with a cache on the remote side. The cache on the remote side serves as a collective storage of results for already executed application parts, avoiding a repeated execution of previously run application parts. In contrast to related approaches in the literature, the caching-aware approach increases the efficiency of code offloading, keeping the energy consumption, execution time, and monetary cost low. 6) While a remote resource executes an application part, code offloading has to handle the occurrence of failures like a failure of the remote resource or a disconnection. In the area of handling the occurrence of failures, this dissertation presents a preemptable offloading of code with safe-points. The preemptable offloading of code with safe-points enables an interruption of an offloading process and a corresponding continuation of a remote execution on a mobile device, without abandoning the complete result calculated remotely so far. Based on a preemptable offloading of code with safe-points, this dissertation further presents a predictive offloading of code with safe-points that minimizes the overhead introduced by safe-point'ing and maximizes the efficiency of a deadline-aware offloading. In contrast to related approaches in the literature, the preemptable approach with safe-point'ing increases the robustness of code offloading in case of failures. Furthermore, the predictive approach for safe-point'ing ensures a minimal responsiveness and a maximal efficiency of applications despite failures. 7) At the end of a remote execution of an application part, code offloading has to gather on the remote resource the required information after the execution and send this information to the mobile device. In the area of gathering the required information, a remote resource utilizes the same approaches as a mobile device, already mentioned above (cf. the bit-flipping approach and the offload-aware API). 8) Last, code offloading has to receive on the mobile device the information from a remote resource, install the information on the mobile device, and continue the execution of the application on the mobile device. In the area of installing the information and continuing the execution locally, a mobile device utilizes the approaches already mentioned above (cf. the bit-flipping approach and the offload-aware API).Item Open Access Scalable biophysical simulations of the neuromuscular system(2021) Maier, Benjamin; Schulte, Miriam (Prof. Dr.)The human neuromuscular system consisting of skeletal muscles and neural circuits is a complex system that is not yet fully understood. Surface electromyography (EMG) can be used to study muscle behavior from the outside. Computer simulations with detailed biophysical models provide a non-invasive tool to interpret EMG signals and gain new insights into the system. The numerical solution of such multi-scale models imposes high computational work loads, which restricts their application to short simulation time spans or coarse resolutions. We tackled this challenge by providing scalable software employing instruction-level and task-level parallelism, suitable numerical methods and efficient data handling. We implemented a comprehensive, state-of-the-art, multi-scale multi-physics model framework that can simulate surface EMG signals and muscle contraction as a result of neuromuscular stimulation. This work describes the model framework and its numerical discretization, develops new algorithms for mesh generation and parallelization, covers the use and implementation of our software OpenDiHu, and evaluates its computational performance in numerous use cases. We obtain a speedup of several hundred compared to a baseline solver from the literature and demonstrate, that our distributed-memory parallelization and the use of High Performance Computing resources enables us to simulate muscular surface EMG of the biceps brachii muscle with realistic muscle fiber counts of several hundred thousands. We find that certain model effects are only visible with such high resolution. In conclusion, our software contributes to more realistic simulations of the neuromuscular system and provides a tool for applied researchers to complement in vivo experiments with in-silico studies. It can serve as a building block to set up comprehensive models for more organs in the musculoskeletal system.Item Open Access B-splines on sparse grids for uncertainty quantification(2021) Rehme, Michael F.; Pflüger, Dirk (Prof. Dr.)