Browsing by Author "Resch, Michael (Prof. Dr.-Ing.)"

Now showing 1 - 10 of 10

Open Access
Analyse und Optimierung der Softwareschichten von wissenschaftlichen Anwendungen für Metacomputing
(2008) Keller, Rainer; Resch, Michael (Prof. Dr.-Ing.)
Für parallele Anwendungen ist das Message Passing Interface (MPI) das Programmierparadigma der Wahl für Höchstleistungsrechner mit verteiltem Speicher. Mittels des Konzeptes des MetaComputings wiederum können verschiedenste Rechenressourcen mit PACX-MPI gekoppelt werden. Dies ist einerseits von Interesse, weil Problemgrößen gelöst werden sollen, die nicht auf nur einem System ausgeführt werden könnten, andererseits, weil gekoppelte Simulationen gerechnet werden, die auf bestimmten Rechnerarchitekturen ausgeführt werden sollen oder weil Systeme mit bestimmten Eigenschaften wie Visualisierungs- mit parallelen Rechenressourcen verbunden werden müssen. Diese Koppelung stellt für die verteilten Anwendungen eine Barriere dar, da Kommunikation zu nicht-lokalen Prozessen weitaus langsamer ist, als über das rechnerinterne Netzwerk. In dieser Arbeit werden Lösungen auf den Software-Ebenen ausgehend von der Netzwerkschicht, durch Verbesserungen innerhalb der verwendeten Middleware, bis hin zur Optimierung innerhalb der Anwendungsschicht erarbeitet. In Bezug auf die unterste Softwareschicht wird für die Middleware PACX-MPI eine allgemeine Bibliothek zur Netzwerkkommunikation auf Basis von User Datagram Protocol (UDP) entwickelt. Somit können Limitierungen des Transport Control Protocols (TCP) umgangen werden, vor allem in Verbindung mit Netzwerken mit hoher Latenz und großer Bandbreite, so genannte Long Fat Pipes. Die hier implementierte Bibliothek ist portabel programmiert und durch die Verwendung von Threads effizient. Dieses Protokoll erreicht gute Werte für die Bandbreite im Local Area Network (LAN), aber auch im Wide Area Network (WAN). Getestet wird dieses Protokoll zur Veranschaulichung mittels einer Verbindung zwischen Rechnern in Stuttgart und Canberra, Australien. Innerhalb der Middleware wird die Optimierung der kollektiven Kommunikationsroutinen behandelt und am Beispiel der Funktion PACX_Alltoall die Verbesserung anhand des IMB Benchmarks auf einem Metacomputer gezeigt. Zur Analyse der Kommunikationseigenschaften wird die Erweiterung einer Tracing-Bibliothek für PACX-MPI, sowie die Implementierung einer generischen Schnittstelle zur Messung der Kommunikationscharakteristik auf MPI-Schicht erläutert. Weiterhin wird eine allgemeine MPI-Testsuite vorgestellt, die beim Auffinden von Fehlern sowohl in PACX-MPI, als auch innerhalb der Open MPI Implementierung hilfreich war. Auf der obersten Softwareschicht werden Optimierungsmöglichkeiten für Anwendungen für MetaComputing aufgezeigt. Beispielhaft wird die Analyse des Kommunikationsmusters einer Anwendung aus dem Bereich der Bioinformatik gezeigt. Weiterhin wird die Implementierung des Cachings und Prefetchings von vielfach kommunizierten Daten mit räumlicher und zeitlicher Lokalität vorgestellt. Erst die Methodik des Cachings und Prefetchings erlaubt die Ausführung der Anwendung in einem Metacomputer und ist exemplarisch für eine Klasse von Algorithmen mit ähnlichem Kommunikationsmuster.
Open Access
Die Berechnung von Wiedereintrittsphänomenen auf hierarchischen Supercomputern mit einem effizienten parallelen Multiblockverfahren
(2007) Bönisch, Thomas; Resch, Michael (Prof. Dr.-Ing.)
Wie in vielen Ingenieurwissenschaften sind auch in der Weltraumforschung Computersimulationen ein wichtiger Bestandteil der Forschung geworden. Auch bei der Entwicklung von Raumfahrzeugen spielt die Simulationsrechnung für die Systemauslegung eine wichtige Rolle. Gerade hier ist die Simulationsrechnung besonders wichtig, da Experimente und Messungen mit enormem Aufwand und extremen Kosten verbunden sind. Allerdings ist auch die Simulation von Raumfahrzeugen, hier speziell die Simulation des Wiedereintritts von Orbitern, erheblich aufwändiger als die Simulation eines ''normalen'' Flugzeugs, da neben den aerodynamischen Effekten auch chemische Reaktionen auftreten und in der Simulation berücksichtigt werden müssen. Für die Berechnung solcher Wiedereintrittsströmungen wurde am Institut für Raumfahrtsysteme der Universität Stuttgart das Programmpaket URANUS entwickelt. Dieses Programm besaß allerdings den Schwachpunkt, dass bisher nur sogenannte C-Netze zur Berechnung eingesetzt werden konnten, mit denen sich komplexere Raumgleiterkonfigurationen, wenn überhaupt, nur sehr schwer vernetzen lassen. Da die Berechnung von Wiedereintrittsströmungen auch erhebliche Anforderungen an Rechenleistung und Speicherbedarf besitzt, soll das neue Programm auch auf modernsten Supercomputerplattformen ohne Leistungsverlust einsetzbar sein. Um diese Schwachpunkte zu beheben wurde das Programm zur Verwendung von sogenannten Multiblocknetzen erweitert. Dies bedingt allerdings eine völlige Überarbeitung des vorhandenen Simulationsprogramms und zwar sowohl hinsichtlich der verwendeten Datenstrukturen als auch bezüglich des Programmablaufs. Dazu wurden die Eigenschaften der Multiblocknetze genau untersucht und daraus die notwendigen Änderungen des Programms spezifiziert. Zur Integration der Multiblocknetze wurde eine neue Datenstruktur entwickelt, die, soweit möglich, bereits zukünftige Erweiterungen und Verbesserungen zum Beispiel in Richtung Mehrgitterverfahren ermöglicht und berücksichtigt. Die vorhandene Programmstruktur wurde in großen Teilen umgestellt und vor allem die Randbehandlung wurde so erweitert, dass die Randbedingungen unabhängig von der Lage des Blocks im Netz auf all seinen Seiten angewendet werden können. Dies erforderte eine allgemeine Formulierung der Randbedingungen. Für den Einsatz auf einer möglichst breiten Palette moderner Supercomputerplattformen wurde das Multiblockprogramm von vorneherein auf die Verwendung massiv paralleler Systeme ausgelegt. Aber auch eine bereits vorhandene Optimierung für Vektorsysteme wurde weitergeführt. Durch die verschiedenen Größen der in Multiblocknetzen auftretenden Netzblöcke wurde hierbei eine komplexe Lastverteilungsstrategie notwendig. Hierzu wurden Algorithmen entwickelt, die Blöcke je nach Anforderung zerlegen können. Welcher Block welchem Prozess zur Berechnung zugeteilt wird, entscheiden Partitionierungsalgorithmen, die als Tool bereits verfügbar waren und über eine Schnittstelle in das Programm integriert worden sind. Die Schnittstelle dazu wurde so gestaltet, dass hier jederzeit neue Algorithmen einfach integriert werden können. In einem weiteren Teil der Arbeit werden Technologien vorgestellt, die für URANUS entwickelt wurden, um das Strömungssimulationsprogramm effizient in einer Metacomputingumgebung einsetzen zu können. Mit Hilfe des neuen parallelen Multiblock URANUS Verfahrens wurden bereits wichtige Simulationsergebnisse für den Wiedereintritt von modernen Raumgleitern erzielt, die ohne dieses Werkzeug nicht möglich gewesen wären.
Open Access
Efficient solution of sparse linear systems arising in engineering applications on vector hardware
(2010) Tiyyagura, Sunil Reddy; Resch, Michael (Prof. Dr.-Ing.)
Block-based Linear Iterative Solver (BLIS) is a scalable software library for solving large sparse linear systems, especially arising from engineering applications. BLIS has been developed during this work to particularly take care of the performance issues related to Krylov iterative methods on vector systems. After several failed attempts to port some general public domain linear solvers onto the NEC SX-8, it is clear that the developers of most solver libraries do not focus on performance issues related to vector systems. This is also true for other software projects due to the fact that clusters of scalar processors were the dominant high performance computing installations in the past few decades. With the advent of vector processing units on most commodity scalar processors, vectorization is again becoming an important software design consideration. In order to understand the strengths and weaknesses of various hardware architectures, benchmarking studies have been done in this work. These studies show that the vector systems are well balanced than most scalar systems with respect to many aspects that determine the sustained performance of many real world applications. The two main performance problems with the public domain solvers are addressed in this work. The first problem of short vector length is solved by introducing a vector specific sparse storage format. The second and the more important problem of high memory latencies is addressed by blocking the sparse matrix. Most engineering problems have multiple unknowns (degrees of freedom) per mesh point to be solved. Typically, public domain solvers do not block all the unknowns to be solved at each mesh point. Instead, they assemble and solve each unknown separately which requires a huge amount of memory traffic. The approach adopted in this work reduces the load on the memory subsystem by blocking all the unknowns at each mesh point and then solving the resulting blocked global system of equations. This is a natural approach for engineering simulations and results in performance improvement on scalar systems due to cache blocking and on vector systems due to reduced memory traffic. Preconditioning is one of the areas in linear solvers that is still actively researched. A preconditioned system of equations has better spectral properties and hence the solution methods will converge faster than with the original system. The key consideration is to keep the time needed for the additional work of preparing the preconditioner as low as possible while at the same time improving the condition number of the resulting system as much as possible. Block based splitting methods and scaling are effective preconditioners than their point based counterparts and at the same time are also efficient. Block based incomplete factorization implemented in BLIS is also more efficient than the corresponding point based method. Robust scalable preconditioners such as the algebraic multigrid method are also available in BLIS. The performance measurements of three application codes running on the NEC SX-8 and using BLIS to solve the linear systems are presented. Lastly, memory bandwidth limitations of new hardware architectures such as the multi-core systems and the STI CELL Broadband Engine are studied. The efficiency and scaling of BLIS is tested on the multi-core systems. Also, the performance of blocked sparse matrix vector product kernel is studied on the STI CELL processor.
Open Access
Increased flexibility and dynamics in distributed applications and processes through resource decoupling
(2014) Kipp, Alexander; Resch, Michael (Prof. Dr.-Ing.)
Continuously increasing complexity of products and services requires more and more specialised expertise as well as relevant support by specialised IT tools and services. However, these services require expert knowledge as well, particularly in order to apply and use these services and tools in an efficient and optimal way. To this end, this thesis introduces a new virtualisation approach, allowing for both, the transparent integration of services in abstract process description languages, as well as the role based integration of human experts in this processes. The developed concept of this thesis has been realised by: - Enhancing the concept of web services with a service virtualisation layer, allowing for the transparent usage, adaptation and orchestration of services - Enhancing the developed concept towards a “Dynamic Session Management” environment, enabling the transparent and role-based integration of human experts following the SOA paradigm - Developing a collaboration schema, allowing for setting up and steering synchronous collaboration sessions between human experts. This enhancement also considers the respective user context and provides the best suitable IT based tooling support. The developed concept has been applied to scientific and economic application fields with a respective reference realisation.
Open Access
Integrated management framework for dynamic virtual organisations
(2008) Wesner, Stefan; Resch, Michael (Prof. Dr.-Ing.)
This thesis describes an Service Level Agreement based model for dynamic virtual organisations and a corresponding management framework for service providers making them able to fullfill such SLAs. The proposed framework is realised as a hierachical model starting from low level management close the hardware and network primitives necessary to realise the services up to the business relationship management layer. The concept is instantiated for the scenario of a High Performance Computing service provider.
Open Access
Interactive parallel post-processing of simulation results on unstructured grids
(2014) Niebling, Florian; Resch, Michael (Prof. Dr.-Ing.)
Numerical simulations and the assessment of their results are constantly gaining importance in product design and optimization workflows in many different fields of engineering. The availability of massively parallel manycore computing resources enables simulations to be executed with accuracies posing very high requirements on the methods for interactive post-processing of the simulation results. A traditional post-processing of such large-scale simulation datasets on single workstations is often no longer possible due to the limited resources such as main memory, the low number of compute cores and the available network bandwidth to the simulation cluster. In this work, concepts and solutions are presented that enable interactive post-processing of large-scale datasets generated by fluid dynamic simulations on unstructured grids through the use of parallel manycore environments. A software architecture the parallel post-processing and visualization, as well as specific optimizations of frequently used methods for post-processing are introduced that enable the interactive use of parallel manycore resources. The implementation of the methods and algorithms is based on existing manycore devices in the form of programmable graphics hardware, which are no longer solemnly usable for computer graphics applications, but are getting increasingly interesting for general purpose computing. It will be shown, that methods for visualization of fluid simulation data such as the interactive computation of cut-surfaces or particle traces is made possible even for large-scale unstructured data. Additionally, an algorithm for the dense texture-based visualization of flow fields will be introduced that combines the presented methods for the extraction of cut-surfaces, isosurfaces and particle tracing. This algorithm for line integral convolution enables the interactive post-processing of flow fields on partitioned and distributed unstructured grids. The methods introduced in this thesis are evaluated using several large-scale simulation datasets from different fields of engineering in scientific and industrial applications.
Open Access
Management von verteilten ingenieurwissenschaftlichen Anwendungen in heterogenen Grid-Umgebungen
(2007) Lindner, Peggy; Resch, Michael (Prof. Dr.-Ing.)
Grid Technologien stellen einen Lösungsansatz für die Verteilung von Anwendungen über mehrere Rechner dar, um Simulationen von wissenschaftlichen Problemen durchführen zu können, die hohe Anforderungen an Rechenressourcen haben. Während diese Art von Anwendungen in den letzten Jahren meistens noch zu Demonstrationszwecken eingesetzt wurde, ist die Grid Technologie heute mehr und mehr ein Werkzeug im täglichen Einsatz. Dabei ist die Heterogenität der vorhandenen Grid Software Umgebungen das größte Problem mit dem Benutzer umgehen müssen, wenn sie parallele, verteilte Anwendungen effizient im Grid ausführen wollen. Im Rahmen dieser Arbeit wird das Konzept und die Implementierung eines Grid Configuration Managers (GCM) vorgestellt, der die Komplexität der Grid Umgebungen und die damit verbundenen Probleme vor dem Benutzer verbergen soll. Das wichtigste Ziel des GCM ist die Vereinfachung des Managements von Grid Umgebungen für Endanwender und Entwickler. Dafür wurden die für die Ausführung von verteilten, parallelen Anwendungen notwendigen Schritte abstrahiert. Des Weiteren wurde ein Konzept für die Integration verschiedener Grid Software Lösungen entwickelt und implementiert. Zurzeit unterstützt der GCM Globus, UNICORE und ssh basierende Umgebungen. Der GCM soll Benutzer hauptsächlich während drei Phasen der Ausführung von Anwendungen helfen: bei der Definition einer Grid Konfiguration, beim Starten und bei der Überwachung einer Grid Anwendung. Der GCM bietet außerdem noch eine spezielle Unterstützung für verteilte Anwendungen, die auf Basis der Kommunikationsbibliothek PACX-MPI entwickelt wurden. Dafür werden die benötigten Konfigurationsdateien automatisch erstellt und auf den beteiligten Rechnern konsistent gehalten. In den Grid Configuration Manager wurde ein auf Leistungsvorhersage basierender Mechanismus zur Auswahl von Rechenressourcen integriert. Ausgehend von einer durch den Benutzer spezifizierten Vorauswahl an Rechnern kann der GCM anhand einer automatischen Abschätzung von Leistungsdaten einer Anwendung vorhersagen, was die effizienteste Umgebung für die Ausführung der Anwendung ist. Für die Leistungsvorhersage wird das Programm Dimemas benutzt. Dimemas kann eine Vorhersage für das Laufzeitverhalten einer Anwendung anhand von Tracing-Daten und Parameter zur Beschreibung der Hardware treffen. Der Grid Configuration Manager wurde in verschiedenen Szenarien getestet und eingesetzt. Dabei wurde aufgezeigt, dass die Handhabung von verteilten Anwendungen durch die Verwendung des GCM signifikant vereinfacht und die Festlegung der Ausführungsumgebung erleichtert wird.
Open Access
MPI-semantic memory checking tools for parallel applications
(2013) Fan, Shiqing; Resch, Michael (Prof. Dr.-Ing.)
The Message Passing Interface (MPI) is a language-independent application interface that provides a standard for communication among the processes of programs running on parallel computers, clusters or heterogeneous networks. However, writing correct and portable MPI applications is difficult: inconsistent or incorrect use of parameters may occur; the subtle semantic differences of various MPI calls may be used inconsistently or incorrectly even by expert programmers. The MPI implementations typically implement only minimal sanity checks to achieve the highest possible performance. Although many interactive debuggers have been developed or extended to handle the concurrent processes of MPI applications, there are still numerous classes of bugs which are hard or even impossible to find with a conventional debugger. There are many cases of memory conflicts or errors, for example, overlapping access or segmentation fault, does not provide enough and useful information for programmer to solve the problem. That is even worse for MPI applications, due to the flexibility and high-frequency of using memory parallel in MPI standard, which makes it more difficult to observe the memory problems in the traditional way. Currently, there is no available debugger helpful especially for MPI semantic memory errors, i.e. detecting memory problem or potential errors according to the standard. For this specific c purpose, in this dissertation memory checking tools have been implemented. And the corresponding frameworks in Open MPI for parallel applications based on MPI semantics have been developed, using different existing memory debugging tool interfaces. Developers are able to detect hard to find bugs, such as memory violations, buffer overrun, inconsistent parameters and so on. This memory checking tool provides detailed comprehensible error messages that will be most helpful for MPI developers. Furthermore, the memory checking frameworks may also help improve the performance of MPI based parallel applications by detecting whether the communicated data is used or not. The new memory checking tools may also be used in other projects or debuggers to perform different memory checks. The memory checking tools do not only apply to MPI parallel applications, but may also be used in other kind of applications that require memory checking. The technology allows programmers to handle and implement their own memory checking functionalities in a flexible way, which means they may define what information they want to know about the memory and how the memory in the application should be checked and reported. The world of high performance computing is Linux-dominated and open source based. However Microsoft is becoming also a more important role in this domain, establishing its foothold with Windows HPC Server 2008 R2. In this work, the advantages and disadvantages of these two HPC operating systems will be discussed. To amend programmability and portability, we introduce a version of Open MPI for Windows with several newly developed key components. Correspondingly, an implementation of memory checking tool on Windows will also be introduced. This dissertation has five main chapters: after an introduction of state of the art, the development of the Open MPI for Windows platform is described, including the work of InfiniBand network support. Chapter four presents the methods explored and opportunities for error analysis of memory accesses. Moreover, it also describes the two implemented tools for this work based on the Intel PIN and the Valgrind tool, as well as their integration into the Open MPI library. In chapter five, the methods are based on several benchmarks (NetPIPE, IMB and NPB) and evaluated using real applications (heat conduction application, and the MD package Gromacs). It is shown that the instrumentation generated by the tool has no significant overhead (NetPIPE with 1.2% to 2.5% for the latency) and accordingly no impact on application benchmarks such as NPB or Gromacs. If the application is executed to analyze with the memory access tools, it extends naturally the execution time by up to 30x, and using the presented MemPin is only half the rate of dropdown. The methods prove successful in the sense that unnecessary data communicated can be found in the heat conduction application and in Gromacs, resulting in the first case, the communication time of the application is reduced by 12%.
Open Access
Service level agreements for job submission and scheduling in high performance computing
(2014) Kübert, Roland; Resch, Michael (Prof. Dr.-Ing.)
This thesis introduces the concept of long-term service level agreements for the offering of quality of service in high performance computing. Feasiblity of the approach is demonstrated by a proof of concept implementation. A simulation tool developed in the scope of this thesis is subsequently used to investigate sensible parameters for quality of service classes in the high performance computing domain.
Open Access
Über die Lösung der Navier-Stokes-Gleichungen mit Hilfe der Moore-Penrose-Inversen des Laplace-Operators im Vektorraum der Polynomkoeffizienten
(2024) Große-Wöhrmann, Bärbel; Resch, Michael (Prof. Dr.-Ing.)
Die bekannten numerischen Standard-Verfahren zur Lösung partieller Differentialgleichungen basieren auf einer räumlichen Diskretisierung des Berechnungsgebiets. Ihre Performance und Skalierbarkeit auf modernen massiv-parallelen Höchstleistungsrechnern ist von der Verfügbarkeit effizienter numerischer Verfahren zur Lösung linearer Gleichungssysteme abhängig. Angesichts grundlegender Herausforderungen erscheint die Entwicklung neuer Lösungsansätze sinnvoll. Ich stelle in dieser Arbeit einen Polynomansatz zur Lösung partieller Differentialgleichungen vor, der nicht auf einer räumlichen Diskretisierung beruht und mit Hilfe der Moore-Penrose-Inversen des Laplace-Operators die Entkopplung der Navier-Stokes-Gleichungen ermöglicht. Dabei ist der Grad der Polynome nicht grundsätzlich beschränkt, so dass eine hohe räumliche Auflösung erreicht werden kann.