05 Fakultät Informatik, Elektrotechnik und Informationstechnik
Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6
Browse
Search Results
Item Open Access Implementing a Cholesky decomposition using SYCL(2025) Bloch, MichalPLSSVM, an LS-SVM implementation, now only uses the Conjugate Gradient algorithm for solving a set of linear equations. However, for an ill-conditioned matrix, it especially gets into trouble, as the converged solution drifts away from the actual solution due to rounding errors. Therefore, this thesis implements a different solver, e.g., the Cholesky Decomposition, which will be implemented in SYCL. We will implement multiple variations of the Cholesky Decomposition algorithm, including a blocked version, and utilize many different features of SYCL. The focus will primarily be on the fastest implementations. In the end, the fastest implementation will be integrated into PLSSVM alongside a Forward and Backward Substitution implementation for solving the set of linear equations. We will conclude with a runtime comparison between the implementations, a comparison of our best Cholesky Decomposition with the Conjugate Gradient using a dataset and a small discussion about numerical errors.Item Open Access Analysis of different preconditioners for kernel matrices based on the PLSSVM library using SYCL(2025) Horstmann, JonasPLSSVM is a library that enables the efficient training and execution of Support Vector Machines, which can be used to classify data. It does so by utilizing various high performance computing frameworks to construct and solve a system of linear equations. The conjugate gradient algorithm is used to iteratively solve this linear system. Large datasets with many features, resulting in ill-conditioned kernel matrices have a negative impact on the convergence of the CG method. To remedy this problem, the goal of this thesis is to analyze different preconditioners in the context of the preconditioned conjugate gradient algorithm, in order to reduce the condition number of the linear system, leading to better convergence and higher stability in regards to different hyperparameter sets. To achieve this goal three different preconditioners were implemented with SYCL and tested, showing that the usage of a preconditioners can indeed help to improve the mentioned aspects, resulting in fewer iterations (up to 78%) to converge and enabling the usage of hyperparameter combinations that were not possible before.Item Open Access Development of a software framework for the generation of data sets with PFLOTRAN(2025) Hausch, MaxThis thesis presents the development of a software framework, VampireMan, designed to automate the generation of diverse and reproducible data sets for training surrogate machine learning models in groundwater flow simulations with heat pumps. Groundwater flow and heat transport simulations are important tools for applications like geothermal energy systems, requiring extensive high-quality data sets for accurate predictive modeling. Surrogate machine learning models have emerged as efficient alternatives to computationally expensive numerical simulations, enabling rapid predictions of subsurface temperature fields. The success of these models relies on the availability of diverse and reliable data sets, encompassing variations in physical and operational parameters. However, manual and semi-automated data set creation approaches are limited in scalability and prone to errors. VampireMan addresses this challenge by automating the entire data generation workflow: systematically varying simulation parameters, generating simulation input files, running simulations with PFLOTRAN, and visualizing outputs. The framework adheres to Research Software Engineering (RSE) and FAIR4RS (Findable, Accessible, Interoperable, and Reusable for Research Software) principles, ensuring reproducibility, scalability, and extensibility. Key features include reproducible data set generation using different parameter variation modes (fixed, constant, and spatial), modular pipeline stages, and integration with PFLOTRAN. VampireMan's effectiveness is demonstrated through preconfigured examples that showcase parameter variations and simulation workflows. By enabling efficient and reproducibility data set generation, VampireMan can help advancing machine learning applications in environmental engineering, facilitating resource-efficient and real-time decision-making for subsurface energy systems.Item Open Access Design and implementation of a data plane interface for a software bridge for Time-Sensitive Networking (TSN)(2025) Fruck, PhilippTime-Sensitive Networking (TSN) is a cornerstone technology for the Industrial Internet of Things (IIoT), enabling deterministic communication over standard Ethernet. Most existing TSN solutions rely heavily on hardware-based implementations, which constrains their suitability for dynamic testing environments. Especially in virtualized edge cloud systems, where the network extends onto the host executing virtual switches to connect containers and virtual machines, the need for a software based TSN solution emerges. While software-based TSN data plane implementations like the Linux TAPRIO QDISC exist, such implementations are commonly configured using CLI commands. This results in a lack of standardization between the control plane and software based data plane implementations. This work presents a software implementation of a data plane interface utilizing NETCONF that enables a standard-compliant exchange between the control plane and the data plane. Our approach complements existing hardware implementations using existing TSN standards. As a software-based implementation, it is particularly suitable for research, prototyping, and validation phases. During this work, the data plane interface is implemented, and the validation of standardized data models is tested to ensure the correctness and standard conformance of the interface. In typical TSN environments, we want to be able to deploy a network schedule in a synchronized, atomic manner across all network devices. This enables us to adapt the schedule to certain applications while preventing network disruptions. To address this challenge, we introduce a novel protocol for coordinated, time-synchronized updates of network schedules across multiple network devices. This mechanism allows seamless transitions between TSN schedules without disrupting active communication, thereby maintaining real-time guarantees during schedule reconfiguration. The proposed solutions significantly enhance the flexibility and reliability of TSN systems in software-centric industrial applications. This work also ensures the correctness of both the software implementation and the designed protocol. We provide manual and programmatic testing methods that ensure a proper implementation of the software prototype and evaluate its performance using benchmarks. The designed protocol is argued to be correct by modelling it as a state machine.Item Open Access Proximity-based service discovery for distributed digital twin systems(2025) Rothermel, Kurt; Herzog, Otthein; Wu, Zhiqiang SiegfriedOver the past decade, there has been a significant increase in interest in digital twin (DT) technology in a variety of domains. While research on DTs of single assets was initially prevalent, there has been a notable shift towards distributed systems of DTs, which connect to each other to collaborate. Typically, collaboration is enabled by DTs providing services that can be consumed by other DTs. In service-oriented systems, a service is typically identified by type information. However, this is not sufficient in distributed DT systems, where DTs associated with different physical entities may provide the same type of service. Consequently, selecting the appropriate service depends not only on the service type, but also on the associated physical entity. However, requiring DTs to know the mapping of services to their physical environment is not feasible for large dynamic systems. This paper presents a novel proximity-based service discovery method that allows DTs to select services based on service type and their proximity to other objects. That is, service specifications are fully abstracted from the mapping of services to physical objects, relieving DTs from maintaining information about this mapping. Furthermore, service discovery is robust to changes in the physical environment and service population. The proposed service discovery method has been implemented on top of a spatial DBMS. We argue that this implementation is optimal in terms of network utilization and latency, and perform comprehensive evaluations to show the performance of discovery queries as a function of their complexity.Item Open Access Hierarchical collectives for HPX(2025) Zeil, LukasHigh-Performance Computing (HPC) is the act of using many concurrently running computational units to solve complex and resource intensive calculative tasks faster than singular Computer. These Tasks include the fourier Transformation, Computational Fluid Dynamics and many more. HPC uses HPC-Clusters, an accumulation of many Computing Nodes, to independently calculate different parts of these tasks and then gathering the results. For these tasks this improves performance immensely. A critical component of many HPC frameworks is the Message Passing Interface (MPI), which provides standardized methods for inter-process communication across distributed systems. The first communcation mechanism of the MPI Standard is the point-to-point Communication, which allows for singular processes to send data to each other. On a higher level, and at the center of our thesis, are the collective communication operations, which transmit data in a group of processes. The MPI Standard and its communication models are implemented by several libraries. Our thesis focuses on OpenMPI, a widely used open-source implementation of MPI, that continuously optimizes performance and scalability, making it the de facto choice for collective communication operations in large-scale operations. A Library with Focus on Parallelism and Concurrency is the STEllAR Group’s HPX for C++. It seeks to address limitations of MPI and other traditional parallelization methods by leveraging techniques such as message-driven computation and constraint-based synchronization. While OpenMPI has introduced numerous optimizations to improve scalability, HPX’s current collective operations still rely on a single root node for message distribution, leading to performance bottlenecks. This thesis proposes a hierarchical communicator for HPX that distributes communication workload across multiple processes, reducing overhead and improving scalability. We created a benchmark to compare our hierarchical communicator to the naive HPX communicator and various OpenMPI Algorithms. Our results show a proportional performance increase across all tested collective operations on HPX, with the hierarchical communicator achieving a speedup of up to 32 for reduce and broadcast operations and 16 for gather and scatter on 256 processes. Compared to OpenMPI’s state-of-the-art implementations for gather, scatter, and broadcast our approach shows a speedup of 1.2 and 1.5 for 128 and 256 processes respectively.Item Open Access Multi-combining : exploring the batching-parallelism trade-off(2025) Bihlmaier, DominikThis thesis investigates the efficient synchronization of concurrent operations on AVL trees in multicore environments. While AVL trees offer logarithmic-time guarantees, concurrent access requires synchronization, which often leads to contention and scalability bottlenecks. Delegation-based synchronization methods, such as Flat Combining, address contention by batching operations but introduce a new bottleneck due to their reliance on a single combiner thread. To overcome this limitation, this thesis proposes a Multi Combining approach that alleviates the bottleneck by partitioning operation batches across multiple combiners. The design leverages the structural properties of AVL trees by partitioning the workload based on the root node, dedicating separate combiners for operations on the left and right subtrees. To minimize the synchronization overhead that is typically required for frequent root updates, the system utilizes a relaxed AVL tree design, which defers rebalancing until a specific threshold is reached. Extensive evaluation shows that Multi Combining consistently outperforms both traditional locking and Flat Combining approaches. Insertion- and removal-heavy workloads achieve up to double the throughput compared to competing methods, while read-heavy workloads benefit from parallel execution of lookups. Overall, the results demonstrate that Multi Combining effectively balances batching and parallelism, providing a scalable synchronization strategy for concurrent balanced trees.Item Open Access Wearable-based study on the influence of music genre preferences on heart rate(2025) Merkle, Simon AndreasThe relationship between music preferences and physiological responses is an emerging field of study with implications for wellness and personalized health. This thesis examines how individual preferences for specific music genres influence heart rate (HR) changes during music listening. Methods: A wearable-based study (n=20) was conducted using a custom mobile application, VitaBeat, that integrates smartwatch HR monitoring with Spotify data collection. The study hypothesized that participants would show smaller increases, or even decreases, in HR when listening to music genres they prefer and are accustomed to, compared to genres they do not prefer. Results: Measurements showed no statistically significant effect of genre preference on HR changes. However, the measurements showed a significant difference between liked and disliked songs. In addition, music in general showed a significant decrease in HR. Conclusion: While genre preference did not significantly influence HR, personal enjoyment of songs and music in general seemed to play a role. These findings should be interpreted in light of the study's limited sample size. The study also demonstrates the potential of wearable technology for accessible, remote physiological data collection.Item Open Access User-centred development of the VitaBeat app for linking smartwatch sensor data with Spotify songs(2025) Tunc, BenjaminThe adoption of wearable technologies, particularly smartwatches, is rapidly increasing, offering new opportunities for collecting and analyzing health data. Music streaming has also become integral to daily life, creating potential for linking music with physiological signals like heart rate. The VitaBeat app was developed as part of a student research project to provide a tool for analyzing the relationship between music and heart rate. Connects music to heart rate measurements, collecting data from smartwatch and smartphone sensors and Spotify playback via the Spotify SDK, which is transmitted to a central back-end for analysis. A requirements analysis was conducted using an online survey to gather insights from potential users. The survey aimed to identify user needs and preferences for the app’s features and usability. Based on the survey findings, several new features were implemented, including heart rate diagrams for individual songs, heart rate overviews, and gamified micro-breaks to enhance engagement. The user interface was also redesigned for better usability and clearer data presentation. This thesis outlines the methodology and results of the survey, the new features developed from those insights, and recommendations for future improvements.Item Open Access Analysis of the performance portability of the SYCL joint_matrix extension(2025) Heinle, FabianThe growing vendor diversity across the fastest supercomputers in the world introduces significant complexity: It is not trivial to ensure that software performs efficiently across multiple hardware platforms. With the introduction of Tensor Cores, Nvidia added another layer of complexity to be considered. AMD and Intel now offer their own solutions for speeding up matrix multiplications. Often programmed in low-level and vendor-specific code, these specialized units make achieving portable high-performance code even more challenging. The performance improvements by factor two and even factor eight for half-precision compared to native cores are nevertheless a good reason to investigate the performance portability of this matrix hardware further. In the course of this thesis, the performance portability of the oneAPI joint_matrix extension is evaluated. This extension provides a vendor-independent API to program matrix hardware in a portable and efficient manner. Therefore, two benchmarks are implemented using SYCL. First, a matrix multiplication utilizing the joint_matrix extension utilizing multiple mixed-precision combinations are tested. Second, the performance of the joint_matrix for a problem not directly mapping on this specialized hardware is evaluated. Therefore, we extend a classical gravitational N-body simulation to use the joint_matrix extension for the pairwise distance calculation between bodies. The result for the matrix multiplication benchmark assesses the impact of different optimization levels when employing the joint_matrix extension. Furthermore, the effect on performance when allocating memory with malloc_share or malloc_device is measured. We found that optimizing the kernel results in the best performance for the joint_matrix extension, achieving ≈ 50% of the cuBLAS performance on Nvidia GPUs for architectural and application efficiency. For all other vendors, no performance portability can be achieved, either due to a malfunctioning oneAPI joint_matrix implementation for AMD, or due to the limited availability of mixed-precision data types for Intel. Deploying the joint_matrix extension to compute the distances between bodies within the N-body simulation showed that this usage can speed up the computation for medium-sized simulation scenarios. For larger simulation scenarios, the naive approach delivers better results due to the computational overhead employed when using the joint_matrix extension for this type of computation.