05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 20
  • Thumbnail Image
    ItemOpen Access
    Implementing a Cholesky decomposition using SYCL
    (2025) Bloch, Michal
    PLSSVM, an LS-SVM implementation, now only uses the Conjugate Gradient algorithm for solving a set of linear equations. However, for an ill-conditioned matrix, it especially gets into trouble, as the converged solution drifts away from the actual solution due to rounding errors. Therefore, this thesis implements a different solver, e.g., the Cholesky Decomposition, which will be implemented in SYCL. We will implement multiple variations of the Cholesky Decomposition algorithm, including a blocked version, and utilize many different features of SYCL. The focus will primarily be on the fastest implementations. In the end, the fastest implementation will be integrated into PLSSVM alongside a Forward and Backward Substitution implementation for solving the set of linear equations. We will conclude with a runtime comparison between the implementations, a comparison of our best Cholesky Decomposition with the Conjugate Gradient using a dataset and a small discussion about numerical errors.
  • Thumbnail Image
    ItemOpen Access
    Design and implementation of a NUMA-aware cooperative scheduler
    (2025) Krieg, Jonas
    Efficient task scheduling is essential for maximizing computational processing unit (CPU) utilization in parallel applications. A widely adopted strategy is work-stealing, where idle threads dynamically steal tasks from busy ones. However, modern multi-core systems are increasingly based on Non Uniform Memory Access (NUMA) architectures, in which memory access latency varies depending on the physical proximity of memory to processor cores. In such systems, traditional work-stealing algorithms-which primarily optimize for memory locality or load distribution-can lead to performance degradation due to load imbalance or remote memory accesses. Despite this critical need for both NUMA-awareness and dynamic load balancing in modern systems, existing scheduling approaches rarely address these requirements simultaneously. Most current work-stealing schedulers prioritize either memory locality or load distribution, failing to capture the complex interactions between memory access patterns and workload balancing. This oversight limits their effectiveness in optimizing performance for real-world, NUMA-based workloads. The goal of this thesis is to design and implement a scheduling system that explicitly considers the effects of Non-Uniform Memory Access and aims to optimize task execution performance in multi-threaded environments. This is done by extending existing scheduling concepts by integrating multiple critical parameters-specifically NUMA-awareness and system load balance-into the scheduling and work-stealing process. In this work, we design and implement a novel hybrid scheduling approach, called NUMA-Load Aware Hybrid Scheduler (NLHScheduler), which combines the strengths of two state-of-the-art work-stealing strategies. Specifically, the NLHScheduler integrates NUMA-locality awareness with dynamic system workload balancing in every scheduling decision. Initial experiments revealed that these two criteria can sometimes conflict, leading to suboptimal scheduling decisions. To address this, the NLHScheduler prioritizes NUMA-locality, as previous analyses have shown that memory locality has a greater impact on performance than load balancing alone. Additionally, we enhanced the initial task assignment mechanism to be both NUMA-aware and load-sensitive, further improving scheduling efficiency. To evaluate the effectiveness of the proposed schedulers, a custom benchmark framework was developed. With this benchmark we tested various workload scenarios, including balanced and imbalanced task distributions, as well as different scaling behaviors by varying queue lengths, the number of thief coroutines, and the system’s concurrency level. The evaluation comparedmedianexecutiontimes, systemthroughput, andsuccessful steal operations across different schedulers. All work-stealing strategies significantly outperformed a baseline round-robin scheduler, improving performance by an average of 38.88%. While the individual stealing strategies showed similar results, the NLHScheduler achieved the greatest gains, especially by improving execution time stability by 40.59%. This work highlights the potential of combining NUMA-awareness and dynamic workload balancing in task scheduling, and lays a foundation for further research into adaptive, performance-oriented scheduling techniques in NUMA architectures.
  • Thumbnail Image
    ItemOpen Access
    Re-identification attacks to validate the privacy provided by anonymization
    (2025) Below, Richard
    When sensitive microdata regarding people is published, choosing a secure anonymization method is vital. Comparing the effectiveness of different anonymization methods is challenging due to their structural differences. Many re-identification attacks exist that attempt to reverse these methods and identify individuals. However, prior work typically evaluates privacy risks in isolation - focusing on a single anonymization technique at a time. The proposal in this work is to compare different anonymization methods by simulating re-identification attacks on them. In a first step, an ontology that models the landscape of attacks and anonymization methods is created. Additionally, all attacks available in the literature are retrieved and analyzed with regard to which anonymization methods are susceptible to them. As specified in the ontology, the anonymization methods, attacks and their relationships are organized into a structured knowledge graph. Finally, a framework is created, making our contributions seamlessly accessible. The framework allows access to the knowledge graph via an interactive visualization. Additionally, attacks that can be simulated on custom data anonymized by different methods are implemented. After simulating the attacks, their success can serve as a state-of-the-art approximation of the actual re-identification risk. The attack’s success aids in bridging the comparability gap between structurally different anonymization methods.
  • Thumbnail Image
    ItemOpen Access
    Greedy-kernel algorithms for data mapping in multiphysics simulations
    (2025) Tucciarone, Fabio
    Data mapping in multiphysics simulation coupling describes the transfer of data between possibly nonconforming meshes. Choosing a numerical approximation method for data mapping is always a trade-off between accuracy and performance. Lower-accuracy methods include, for example, the first-order nearest-neighbour mapping, whereas higher accuracies can often be achieved with a computationally expensive radial basis function interpolation. We extend the multiphysics coupling library preCICE with a greedy approach to radial basis function interpolation. We implement and evaluate the P- and f-greedy methods, which aim to reduce the size of a radial basis function interpolant using a greedy vertex selection approach. The greedy selection is terminated when a user-defined tolerance for a greedy criterion is reached. We compare this method to the nearest neighbour, as well as a global-direct solution and a partition-of-unity approach to radial basis function mapping. We find, that the greedy selection process is computationally expensive for small error tolerances. At the same time, we often see an improvement in mapping time compared to a global-direct solution after the interpolant has been constructed in an offline stage. For high error tolerances, a nearest-neighbour mapping is typically the cheaper option. The partition-of-unity method can achieve comparatively small errors for better runtimes.
  • Thumbnail Image
    ItemOpen Access
    Effect of data preparation in the context of fair classification
    (2025) Barts, Valer
    This thesis investigates the critical role of data preparation in shaping the predictive performance and fairness of binary classification models. Given that the quality and composition of training data significantly influence model behaviour, especially concerning embedded biases, ensuring that training data is both accurate and fair is essential for the development of trustworthy machine learning systems. To address this, we extend an existing data processing pipeline, substantially broadening its data preparation stage with the integration of sixteen additional methods across five distinct components. This expansion allows for a more comprehensive evaluation of the interplay between data preparation, predictive accuracy, and algorithmic fairness. Our empirical study employs a diverse set of classifiers and evaluation metrics, including several newly developed scores specifically designed to capture the nuanced effects of data preparation on model outcomes. The analysis spans both real-world and synthetic datasets, providing a robust foundation for our findings. Key insights include the observation that simply increasing the number of data preparation components does not necessarily improve model performance. Instead, optimal results often depend on carefully chosen methods and execution orders, with some components displaying strong positional dependencies. Additionally, our results reaffirm the well-documented trade-off between fairness and accuracy, yet also demonstrate that it is possible to identify configurations where both can be improved simultaneously. These findings not only deepen our understanding of data preparation in the context of fair classification but also offer concrete, empirically grounded recommendations for practitioners. Our work lays the foundation for more informed pipeline design, providing a flexible, modular framework that can be readily extended to accommodate emerging data preparation techniques and new evaluation metrics.
  • Thumbnail Image
    ItemOpen Access
    Lossfunction for physics-informed machine learning in groundwater flow
    (2025) Aheimer, Björn
    Physics-Informed Machine Learning (PIML) methods for solving complex nonlinear partial differential equations (PDEs) have recently gained popularity in the simulation sciences. Unlike purely data-driven approaches, PIML integrates prior knowledge about the underlying physical system - expressed through the governing PDEs - into the learning process. This is done via a loss function that includes the PDE residual, thereby penalizing model outputs that do not fulfill the PDE. Hence, PIML methods are particularly appealing for scenarios with limited data availability, where extensive measurements or numerical simulation costs are prohibitive. However, purely physics-informed models often suffer from various pathologies, rendering purely physics-informed learning ineffective. One major challenge is the complex loss landscape introduced by the PDE residual. This work examines how simplifying the governing PDE can enhance training performance or enable physics-informed learning in cases where it would otherwise be infeasible. Building on the approach proposed by Piller for extending predictions of heat plumes generated by Groundwater Heat Pumps (GWHP), this study verifies the effectiveness of such PDE simplifications. Concretely, this work finds a moderate monotonic correlation (Spearman: 0.59) between the simplified PDE residual and the data loss, indicating that the simplification of the governing PDEs preserves enough of the physics to be useful while making training more tractable. To this end, Physics-Informed Neural Networks for Heat Plume Extension (HPE-PINN) and Physics-Informed-Neural Operators for Heat Plume Extension (HPE-PINO) are developed and compared against Piller's model, which makes use of Singular Value Decomposition (SVD) to reduce the dimensionality of the solution space. Throughout this process, several common PIML pathologies are encountered. A suite of techniques to mitigate their negative effects on training is presented, implemented, and validated to improve training stability and model performance. Comprehensive ablation studies highlight the effectiveness of normalization and hard enforcement of initial and boundary conditions in enhancing convergence, as well as the importance of Fourier feature embeddings to reduce spectral bias.
  • Thumbnail Image
    ItemOpen Access
    Flexible mesh-particle coupling with preCICE
    (2025) Walloner, Robin
    Coupling particle-based and mesh-based solvers presents significant challenges in multiphysics simulations. This thesis explores flexible mesh-particle coupling using the preCICE coupling library. We investigate newly introduced features in preCICE version 3.2.0, specifically just-in-time data mapping and dynamic remeshing, as well as the not-yet-released coarse-graining mapping, to assess whether they enable efficient and accurate data exchange between Lagrangian particle solvers and Eulerian continuum solvers. First, we develop a one-way coupled particle tracing prototype to demonstrate key aspects of particle-based coupling with preCICE and to evaluate the achievable accuracy. Most notably, we implement a two-way coupled unresolved Computational Fluid Dynamics - Discrete Element Method (CFD-DEM) simulation using the open-source solvers OpenFOAM and LIGGGHTS. This serves as a case study that highlights the capabilities and limitations of preCICE’s current features around mesh-particle coupling. We apply our particle tracing prototype to vortex and channel flow scenarios, and our CFD-DEM coupling to single particle sedimentation and fluidized bed simulations. We find that, while just-in-time mapping works well for coupling particle-based solvers, dynamic remeshing is not currently applicable in this context. In addition, coarse-graining shows strong potential as a flexible and robust approach for mapping particles to continuum fields. Going forward, this thesis reveals opportunities for future development within the preCICE ecosystem and offers guidance for further improvements to mesh-particle coupling.
  • Thumbnail Image
    ItemOpen Access
    Hierarchical collectives for HPX
    (2025) Zeil, Lukas
    High-Performance Computing (HPC) is the act of using many concurrently running computational units to solve complex and resource intensive calculative tasks faster than singular Computer. These Tasks include the fourier Transformation, Computational Fluid Dynamics and many more. HPC uses HPC-Clusters, an accumulation of many Computing Nodes, to independently calculate different parts of these tasks and then gathering the results. For these tasks this improves performance immensely. A critical component of many HPC frameworks is the Message Passing Interface (MPI), which provides standardized methods for inter-process communication across distributed systems. The first communcation mechanism of the MPI Standard is the point-to-point Communication, which allows for singular processes to send data to each other. On a higher level, and at the center of our thesis, are the collective communication operations, which transmit data in a group of processes. The MPI Standard and its communication models are implemented by several libraries. Our thesis focuses on OpenMPI, a widely used open-source implementation of MPI, that continuously optimizes performance and scalability, making it the de facto choice for collective communication operations in large-scale operations. A Library with Focus on Parallelism and Concurrency is the STEllAR Group’s HPX for C++. It seeks to address limitations of MPI and other traditional parallelization methods by leveraging techniques such as message-driven computation and constraint-based synchronization. While OpenMPI has introduced numerous optimizations to improve scalability, HPX’s current collective operations still rely on a single root node for message distribution, leading to performance bottlenecks. This thesis proposes a hierarchical communicator for HPX that distributes communication workload across multiple processes, reducing overhead and improving scalability. We created a benchmark to compare our hierarchical communicator to the naive HPX communicator and various OpenMPI Algorithms. Our results show a proportional performance increase across all tested collective operations on HPX, with the hierarchical communicator achieving a speedup of up to 32 for reduce and broadcast operations and 16 for gather and scatter on 256 processes. Compared to OpenMPI’s state-of-the-art implementations for gather, scatter, and broadcast our approach shows a speedup of 1.2 and 1.5 for 128 and 256 processes respectively.
  • Thumbnail Image
    ItemOpen Access
    Multi-combining : exploring the batching-parallelism trade-off
    (2025) Bihlmaier, Dominik
    This thesis investigates the efficient synchronization of concurrent operations on AVL trees in multicore environments. While AVL trees offer logarithmic-time guarantees, concurrent access requires synchronization, which often leads to contention and scalability bottlenecks. Delegation-based synchronization methods, such as Flat Combining, address contention by batching operations but introduce a new bottleneck due to their reliance on a single combiner thread. To overcome this limitation, this thesis proposes a Multi Combining approach that alleviates the bottleneck by partitioning operation batches across multiple combiners. The design leverages the structural properties of AVL trees by partitioning the workload based on the root node, dedicating separate combiners for operations on the left and right subtrees. To minimize the synchronization overhead that is typically required for frequent root updates, the system utilizes a relaxed AVL tree design, which defers rebalancing until a specific threshold is reached. Extensive evaluation shows that Multi Combining consistently outperforms both traditional locking and Flat Combining approaches. Insertion- and removal-heavy workloads achieve up to double the throughput compared to competing methods, while read-heavy workloads benefit from parallel execution of lookups. Overall, the results demonstrate that Multi Combining effectively balances batching and parallelism, providing a scalable synchronization strategy for concurrent balanced trees.
  • Thumbnail Image
    ItemOpen Access
    Implementierung und Evaluierung eines gPTP-Time-Receivers für den NXP i.MX RT1062 Micro-Controller und das Zephyr Betriebssystem
    (2025) Riep, Patricia
    In vielen Anwendungen in Industrie, Robotik und Fahrzeugtechnik entsteht ein steigender Bedarf an echtzeitfähigen Systemen, und damit auch echtzeitfähigen Netzwerken. Für die Entwicklung solcher Netzwerke liefert die TSN Task Group der IEEE802.1 eine Reihe an Standards für Kommunikationspfade, Scheduling/Traffic Shaping, sowie Zeitsynchronisierung. In dieser Arbeit wird Zeitsynchronisierung mittels gPTP behandelt und ein PI-Clock-Servo konzeptioniert und implementiert. Hierbei wird die existierende Implementierung im Zephyr-Echtzeitbetriebssystem um eine Normierungskomponente erweitert. Außerdem wird ein verbesserter Ansatz zur Berechnung der Übertragungslatenz präsentiert. Diese Implementierung wird experimentell mit der bisherigen und mit der Implementierung in LinuxPTP verglichen. Dabei werden Synchronisierungsgenauigkeit und Einschwingzeiten untersucht. Als Hardware wird ein Teensy 4.1 Mikrocontroller-Board (mit Zephyr) und ein Raspberry Pi Compute Module 4 verwendet. Zusätzlich wird die Kompatibilität der Implementierung in Zephyr mit verschiedenen TSN-Switches untersucht.