05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 102
  • Thumbnail Image
    ItemOpen Access
    Scalable computer network emulation using node virtualization and resource monitoring
    (2011) Maier, Steffen Dirk; Rothermel, Kurt (Prof. Dr. rer. nat. Dr. h. c.)
    Ongoing development of computer network technology requires new communication protocols on all layers of the protocol stack to adapt to and to exploit technology specifics. The performance of new protocol implementations has to be evaluated before deployment. Computer network emulation enables the execution of real unmodified protocol implementations within a configurable synthetic environment. Since network properties are reproduced synthetically, emulation supports reproducible measurement results for wired and wireless networks. Meaningful evaluation scenarios typically involve a large number of communicating nodes. Reproducing the network properties of the medium access control layer can be accomplished efficiently on cheap common off the shelf computers and allows to evaluate network protocols, transport protocols, and applications. However, meaningful emulation scenario sizes often require more nodes than affordable computers. To scale the number of nodes in an emulation scenario beyond the available computers, we discuss approaches to virtualization and operating system partitioning. Focusing on the latter, we argue for virtual protocol stacks, which provide an extremely lightweight node virtualization enabling the execution of multiple instances of software to be evaluated on each physical computer. To connect virtual nodes on the same and on different computers, we design and implement a highly efficient software communication switch. A centralized emulation control component distributes dynamic network property updates which result from node mobility for instance. To handle the large number of nodes and thus increased updates, we propose a hierarchical control where the central component delegates updates to sub-components distributed over the computers of an emulation system. Extensive evaluations show the scalability of our virtualized network emulation system. Virtual nodes executed on the same computer share its limited resources. Hosting too many virtual nodes on the same computer may lead to resource contention. This can cause unrealistic measurement results and is thus undesirable. Discussing different approaches to handle resource contention, we argue for detection and recovery. We define quality criteria that allow the detection of resource contention. In order to observe those quality criteria during emulation experiments, we propose a highly lightweight monitoring approach. Our monitoring is based on instrumenting an operating system kernel and observing basic resource scheduling events. This enables the detection of even peak resource usage within a split second. Thorough evaluations demonstrate the effectiveness of quality criteria and monitoring as well as the negligible overhead of our monitoring approach.
  • Thumbnail Image
    ItemOpen Access
    Scalable traffic engineering heuristics for time-triggered communication in real-time networks
    (2026) Geppert, Heiko; Rothermel, Kurt (Prof. Dr. rer. nat. Dr. h. c.)
    Distributed safety-critical cyber-physical systems require real-time behavior. This means they must respond not just quickly, but in time, to new situations considering both, the task processing and network communication time. From a networking perspective, meticulous, time-driven traffic planning performed at the frame level is necessary to guarantee low end-to-end delay bounds and low latency. This involves carefully planning transmission operations along each time-critical frame's network path are carefully planned, including precise timing, to limit or even eliminate interference from cross-traffic and ensure timely delivery. Since modern real-time systems can consist of hundreds or thousands of devices - for example, large manufacturing plants or continental-sized power grids - the traffic planning must be highly scalable. Although there are many traffic planning approaches in the literature, there is a lack of very fast heuristics that can handle very large stream sets and networks quickly. This thesis investigates traffic planning heuristics and optimization techniques, focusing on different aspects of the traffic planning domain. The traffic planning consists of novel methods for conflict-graph-based scheduling and new heuristics for very large instances of traffic planning problem. The optimizations include multicast partitioning, which combines the benefits of multicast and unicast traffic plans, and load-balanced stream placement, which generates traffic plans that can accommodate additional streams joining the system later. We created prototype implementations and analyzed their performance in solving the traffic planning problem. Our traffic plans yielded a higher accumulated network throughput or admitted more streams while maintaining computation times ranging from sub-seconds to minutes, even for extremely large-scale problem instances. The traffic planning methods and optimization techniques presented in this thesis can be applied to modern real-time networking technologies, such as Time-Sensitive Networking and TTEthernet.
  • Thumbnail Image
    ItemOpen Access
    Optimierung datenintensiver Workflows: Konzepte und Realisierung eines heuristischen, regelbasierten Optimierers
    (2011) Vrhovnik, Marko; Mitschang, Bernhard (Prof. Dr.-Ing. habil.)
    Um die Modellierung datenintensiver Workflows, die große relationale Datenmengen verarbeiten, zu vereinfachen, wurden Workflowbeschreibungssprachen, wie BPEL, von führenden Herstellern von Workflow- und Datenbankmanagementsystemen um SQL-Funktionalität erweitert. Dadurch müssen Datenverarbeitungsoperationen, wie SQL-Anweisungen oder Aufrufe benutzerdefinierter Prozeduren, nicht mehr in Web-Services gekapselt werden, sondern können direkt auf der Workflowebene definiert werden. Daraus resultiert eine neue Möglichkeit der Anfrageoptimierung, die existierende Optimierungsansätze in Datenbanksystemen ergänzt: Suboptimal modellierte Datenverarbeitungsoperationen lassen sich in einer Workflowbeschreibung unter Verwendung von Restrukturierungsregeln derart transformieren, dass sie von einem Workflow- bzw. Datenbankmanagementsystem wesentlich effizienter ausgeführt werden können. In dieser Doktorarbeit werden Konzepte zur Realisierung eines heuristischen, regelbasierten Optimierers für datenintensive Workflows vorgestellt. Der Optimierer wendet eine Regelbasis gemäß einer wohldefinierten Kontrollstrategie auf eine interne Repräsentation für datenintensive Workflows, dem sogenannten Prozessgraphenmodell (PGM), an, um die Datenverarbeitung eines datenintensiven Workflows zu optimieren. PGM erlaubt eine effiziente und sprachunabhängige Definition und Anwendung der Restrukturierungsregeln und unterstützt somit eine Optimierung von Datenverarbeitungsoperationen, die in unterschiedlichen Beschreibungssprachen definiert sein können. Die Regelbasis enthält Restrukturierungsregeln, die auf existierenden und neuen Optimierungsstrategien beruhen. Insbesondere nutzen die Restrukturierungsregeln das Wissen über Abhängigkeiten in einer Workflowbeschreibung aus, um die darin eingebetteten Datenverarbeitungsoperationen unter Beibehaltung der ursprünglichen Ausführungssemantik eines datenintensiven Workflows zu optimieren. Die Kontrollstrategie bestimmt, welche Restrukturierungsregeln in welcher Reihenfolge auf welche Teile einer Workflowbeschreibung angewendet werden, um zum einen das Optimierungspotential eines datenintensiven Workflows umfassend zu nutzen und zum anderen die Korrektheit der Regelanwendungen sicherzustellen. Die ausführliche Beschreibung des Prozessgraphenmodells, der Regelbasis und der Kontrollstrategie stehen im Mittelpunkt dieser wissenschaftlichen Abhandlung. Des Weiteren wird eine prototypische Implementierung des Optimierungsansatzes vorgestellt, welche dessen praktische Einsatzfähigkeit unterstreicht. Schließlich wird die Effektivität der einzelnen Restrukturierungsregeln mithilfe verschiedener Messszenarien untersucht. Dabei wird gezeigt, dass durch Anwendung der Restrukturierungsregeln Leistungssteigerungen in mehreren Größenordnungen erreicht werden können.
  • Thumbnail Image
    ItemOpen Access
    From irregular parallelism to portable GPU kernels : enabling efficient task-based GPU programming with HPX, Kokkos and CPPuddle to accelerate stellar mergers
    (2026) Daiß, Gregor; Pflüger, Dirk (Prof. Dr.)
    Adaptive, tree-based structures are the foundation of many of the most efficient algorithms and applications. Yet, running such applications on supercomputers is challenging, as the application developers have to handle, for example, distributed tree-traversals and load balancing. Distributed, asynchronous many-task runtime systems such as HPX aim to alleviate these challenges by embracing fine-grained tasks, enabling a fine interweaving of communication and computation through their task-graph. However, the fine-grained approach embraced by HPX is the antithesis for implementing efficient GPU kernels, which leverage the large numbers of available parallel work items to scale to all compute units of a GPU and to hide latencies. This is compounded by the synchronization of GPU operations, which necessitates blocking entire CPU threads, nullifying any potential benefits of interleaving CPU tasks with the GPU operations through the HPX task-graph. Hence, efficiently leveraging GPU-accelerated supercomputers with HPX applications can still be challenging, albeit for different reasons. In this work, we reconcile these differences and unlock the computational performance of GPUs for HPX applications. Our contributions can be used in any HPX application. However, to benchmark them, we turn to a specific real-world HPX application, the astrophysics code Octo-Tiger. Octo-Tiger is used for large-scale simulations of stellar mergers and demonstrates both the pitfalls and the potential of leveraging large GPU-accelerated supercomputers for applications that rely on such a task-based approach. We refactored Octo-Tiger's CPU-only code, ported its computational hotspots to GPUs, and ultimately turned it into a GPU-accelerated application, greatly speeding up these hotspots with various new GPU compute kernels. These new GPU kernels achieve speedups of up to $283$x compared to Octo-Tiger's previous implementation. However, they also demonstrate the pitfalls of simply replacing CPU tasks by GPU kernels, as Octo-Tiger's overall runtime initially increased almost eightfold when using them despite these individual speedups within the hotspots. To solve the limiting problems, we employ various techniques. Most notably, we integrated asynchronous GPU operations into the HPX task-graph using a polling approach, dynamic GPU kernel fusion based on specialized executors, and GPU memory pools that are optimized for HPX. With these solutions in place, we translate the individual speedups of the GPU kernels into tangible speedups for the entire application of up to 8.5x. Furthermore, we combine this approach with the performance portability library Kokkos, to be able to target different GPU-accelerated supercomputers. In this work, we particularly demonstrate scalability on the supercomputers Perlmutter and Frontier, distributing Octo-Tiger across over a thousand GPU compute nodes.
  • Thumbnail Image
    ItemOpen Access
    Data-integrated methods for performance improvement of massively parallel coupled simulations
    (2022) Totounferoush, Amin; Schulte, Miriam (Prof. Dr.)
    This thesis presents data-integrated methods to improve the computational performance of partitioned multi-physics simulations, particularly on highly parallel systems. Partitioned methods allow using available single-physic solvers and well-validated numerical methods for multi-physics simulations by decomposing the domain into smaller sub-domains. Each sub-domain is solved by a separate solver and an external library is incorporated to couple the solvers. This significantly reduces the software development cost and enhances flexibility, while it introduces new challenges that must be addressed carefully. These challenges include but are not limited to, efficient data communication between sub-domains, data mapping between not-matching meshes, inter-solver load balancing, and equation coupling. In the current work, inter-solver communication is improved by introducing a two-level communication initialization scheme to the coupling library preCICE. The new method significantly speed-ups the initialization and removes memory bottlenecks of the previous implementation. In addition, a data-driven inter-solver load balancing method is developed to efficiently distribute available computational resources between coupled single-physic solvers. This method employs both regressions and deep neural networks (DNN) for modeling the performance of the solvers and derives and solves an optimization problem to distribute the available CPU and GPU cores among solvers. To accelerate the equation coupling between strongly coupled solvers, a hybrid framework is developed that integrates DNNs and classical solvers. The DNN computes a solution estimation for each time step which is used by classical solvers as a first guess to compute the final solution. To preserve DNN's efficiency during the simulation, a dynamic re-training strategy is introduced that updates the DNN's weights on-the-fly. The cheap but accurate solution estimation by the DNN surrogate solver significantly reduces the number of subsequent classical iterations necessary for solution convergence. Finally, a highly scalable simulation environment is introduced for fluid-structure interaction problems. The environment consists of highly parallel numerical solvers and an efficient and scalable coupling library. This framework is able to efficiently exploit both CPU-only and hybrid CPU-GPU machines. Numerical performance investigations using a complex test case demonstrate a very high parallel efficiency on a large number of CPUs and a significant speed-up due to the GPU acceleration.
  • Thumbnail Image
    ItemOpen Access
    Surrogate modeling with scientific machine learning
    (2025) Leiteritz, Raphael; Pflüger, Dirk (Prof. Dr.)
  • Thumbnail Image
    ItemOpen Access
    Flexible processing of streamed context data in a distributed environment
    (2014) Cipriani, Nazario; Mitschang, Bernhard (Prof. Dr.-Ing. habil.)
    Nowadays, stream-based data processing occurs in many context-aware application scenarios, such as in context-aware facility management applications or in location-aware visualization applications. In order to process stream-based data in an application-independent manner, Data Stream Processing Systems (DSPSs) emerged. They typically translate a declarative query to an operator graph, place the operators on stream processing nodes and execute the operators to process the streamed data. Context-aware stream processing applications often have different requirements although relying on the same processing principle, i.e. data stream processing. These requirements exist because context-aware stream processing applications differ in functional and operational behavior as well as their processing requirements. These facts are challenging on their own. As a key enabler for the effcient processing of streamed data the DSPS must be able to integrate this speciVc functionality seamlessly. Since processing of data streams usually is subject to temporal aspects, i.e. they are time critical, custom functionality should be integrated seamlessly in the processing task of a DSPS to prevent the formation of isolated solutions and to support exploitation of synergies. Depending on the domain of interest, data processing often depends on highly domain-specific functionalities, e.g. for the application of a location-aware visualization pipeline displaying a three-dimensional map of its surroundings. The application runs on a mobile device and consists of many interconnected operations that form a network of operators called stream processing graph (SP graph). First, the friends’ locations must be collected and connected to their public profile. However, to enable the application to run smoothly for some parts of data processing the presence of a Graphics Processing Unit (GPU) is mandatory. To solve that challenge, we have developed concepts for a flexible DSPS that allows the integration of specific functionality to enable a seamless integration of applications into the DSPS. Therefore, an architecture is proposed. A DSPS based on this architecture can be extended by integrating additional operators responsible for data processing and services realizing additional interaction patterns with context-aware applications. However, this specific functionality is often subject to deployment and run time constraints. Therefore, an SP graph model has been developed which reWects these constraints by allowing to annotate the graph by constraints, e.g. to constrain the execution of operators to only certain processing nodes or specify that the operator necessitates a GPU. The data involved in the processing steps is often subject to restrictions w.r.t the way it is accessed and processed. Users participating in the process might not want to expose their current location to potentially unknown parties, restricting e.g. data access to known ones only. Therefore, in addition to the Wexible integration of specialized operators security aspects must also be considered, limiting the access of data as well as the granularity of which data is made available. We have developed a security framework that defines three different types of security policies: Access Control (AC) policies controlling data access, Process Control (PC) policies influencing how data is processed, and Granularity Control (GC) policies defining the Level of Detail (LOD) at which the data is made available. The security policies are interpreted as constraints which are supported by augmenting the SP graph by the relevant security policies. The operator placement in a DSPS is very important, as it deeply influences SP graph execution. Every stream-based application requires a different placement of SP graphs according to its specific objectives, e.g. bandwidth should not fall below 500 MBit/s and is more important than latency. This fact constrains operator placement. As objectives might conflict among each other, operator placement is subject to trade-offs. Knowing the bandwidth requirements of a certain application, an application developer can clearly identify the specific Quality of Service (QoS) requirements for the correct distribution of the SP graph. These requirements are a good indicator for the DSPS to decide how to distribute the SP graph to meet the application requirements. Two applications within the same DSPS might have different requirements. E.g. if interactivity is an issue, a stream-based game application might in a first place need a minimization of latency to get a fast and reactive application. We have developed a multi-target operator placement (M-TOP) algorithm which allows the DSPS to find a suitable deployment, i.e. a distribution of the operators in an SP graph which satisfies a set of predefined QoS requirements. Thereby, the M-TOP approach considers operator-specific deployment constraints as well as QoS targets.
  • Thumbnail Image
    ItemOpen Access
    Supporting multi-tenancy in Relational Database Management Systems for OLTP-style software as a service applications
    (2015) Schiller, Oliver; Mitschang, Bernhard (Prof. Dr.-Ing. habil.)
    The consolidation of multiple tenants onto a single relational database management system (RDBMS) instance, commonly referred to as multi-tenancy, turned out being beneficial since it supports improving the profit margin of the provider and allows lowering service fees, by what the service attracts more tenants. So far, existing solutions create the required multi-tenancy support on top of a traditional RDBMS implementation, i. e., they implement data isolation between tenants, per-tenant customization and further tenant-centric data management features in application logic. This is complex, error-prone and often reimplements efforts the RDBMS already offers. Moreover, this approach disables some optimization opportunities in the RDBMS and represents a conceptual misstep with Separation of Concerns in mind. For the points mentioned, an RDBMS that provides support for the development and operation of a multi-tenant software as a service (SaaS) offering is compelling. In this thesis, we contribute to a multi-tenant RDBMS for OLTP-style SaaS applications by extending a traditional disk-oriented RDBMS architecture with multi-tenancy support. For this purpose, we primarily extend an RDBMS by introducing tenants as first-class database objects and establishing tenant contexts to isolate tenants logically. Using these extensions, we address tenant-aware schema management, for which we present a schema inheritance concept that is tailored to the needs of multi-tenant SaaS applications. Thereafter, we evaluate different storage concepts to store a tenant’s tuples with respect to their scalability. Next, we contribute an architecture of a multi-tenant RDBMS cluster for OLTP-style SaaS applications. At that, we focus on a partitioning solution which is aligned to tenants and allows obtaining independently manageable pieces. To balance load in the proposed cluster architecture, we present a live database migration approach, whose design favors low migration overhead and provides minimal interruption of service.
  • Thumbnail Image
    ItemOpen Access
    Ansätze für flexible und fehlertolerante modellgetriebene IoT-Anwendungen in dynamischen Umgebungen
    (2024) Del Gaudio, Daniel; Mitschang, Bernhard (Prof. Dr.-Ing. habil.)
  • Thumbnail Image
    ItemOpen Access
    Effiziente Gestaltung und Anwendung von attributbasierter Zugriffskontrolle für RESTful Services
    (2019) Hüffmeyer, Marc; Mitschang, Bernhard (Prof. Dr.)