Universität Stuttgart
Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1
Browse
20 results
Search Results
Item Open Access Process migration in a parallel environment(Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2016) Reber, Adrian; Resch, Michael (Prof. Dr.- Ing. Dr. h.c. Dr. h.c. Prof. E.h.)To satisfy the ever increasing demand for computational resources, high performance computing systems are becoming larger and larger. Unfortunately, the tools supporting system management tasks are only slowly adapting to the increase in components in computational clusters. Virtualization provides concepts which make system management tasks easier to implement by providing more flexibility for system administrators. With the help of virtual machine migration, the point in time for certain system management tasks like hardware or software upgrades no longer depends on the usage of the physical hardware. The flexibility to migrate a running virtual machine without significant interruption to the provided service makes it possible to perform system management tasks at the optimal point in time. In most high performance computing systems, however, virtualization is still not implemented. The reason for avoiding virtualization in high performance computing is that there is still an overhead accessing the CPU and I/O devices. This overhead continually decreases and there are different kind of virtualization techniques like para-virtualization and container-based virtualization which minimize this overhead further. With the CPU being one of the primary resources in high performance computing, this work proposes to migrate processes instead of virtual machines thus avoiding any overhead. Process migration can either be seen as an extension to pre-emptive multitasking over system boundaries or as a special form of checkpointing and restarting. In the scope of this work process migration is based on checkpointing and restarting as it is already an established technique in the field of fault tolerance. From the existing checkpointing and restarting implementations, the best suited implementation for process migration purposes was selected. One of the important requirements of the checkpointing and restarting implementation is transparency. Providing transparent process migration is important enable the migration of any process without prerequisites like re-compilation or running in a specially prepared environment. With process migration based on checkpointing and restarting, the next step towards providing process migration in a high performance computing environment is to support the migration of parallel processes. Using MPI is a common method of parallelizing applications and therefore process migration has to be integrated with an MPI implementation. The previously selected checkpointing and restarting implementation was integrated in an MPI implementation, and thus enabling the migration of parallel processes. With the help of different test cases the implemented process migration was analyzed, especially in regards to the time required to migrated a process and the advantages of optimizations to reduce the process’ downtime during migration.Item Open Access Energieeffizienz von Prozessoren in High Performance Computinganwendungen der Ingenieurwissenschaften(Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2018) Khabi, Dmitry; Resch, Michael M. (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)Im Mittelpunkt dieser Arbeit steht die Frage nach Energieeffizienz im Hochleistungsrechnen (HPC) mit Schwerpunkt auf Zusammenhänge zwischen der elektrischen Leistung der Prozessoren und deren Rechenleistung. In Kapitel 1, Einleitung der folgenden Abhandlungen, werden die Motivation und der Stand der Technik auf dem Gebiet der Strommessung und der Energieeffizienz im HPC und dessen Komponenten erläutert. In den Folgenden Kapiteln 2 und 3 wird eine am Höchstleistungsrechenzentrum Stuttgart (HLRS) entwickelte Messtechnik detailliert diskutiert, die für die Strommessungen im Testcluster angewendet wird. Das Messverfahren der unterschiedlichen Hardwarekomponenten und die Abhängigkeit zwischen deren Stromversorgung, Messgenauigkeit und Messfrequenz werden dargelegt. Im Kapitel 4 der Arbeit beschreibe ich, welchen Zusammenhang es zwischen dem Stromverbrauch eines Prozessors, dessen Konfiguration und darauf ausgeführten Algorithmen gibt. Der Fokus liegt dabei auf den Zusammenhängen zwischen CPU-Frequenz, Grad der Parallelisierung, Rechenleistung und elektrischer Leistung. Für den Effizienzvergleich zwischen den Prozessoren und Algorithmen benutze ich ein Verfahren, das auf eine Approximation in der analytischen Form der Rechen- und der elektrischen Leistung der Prozessoren basiert. In diesem Kapitel wird außerdem gezeigt, dass die Koeffizienten der Approximation, die mehrere Hinweise auf Software und Hardware-Eigenschaften geben, als Basis für die Ausarbeitung eines erweiterten Modells dienen können. Wie im weiteren Verlauf gezeigt wird, berücksichtigen die existierenden Modelle der Rechen- und der elektrischen Leistung nur zum Teil die unterschiedlichen Frequenz-Domains der Hardwarekomponenten. Im Kapitel 5 wird eine Erweiterung des existierenden Modells der Rechenleistung erläutert, mit dessen Hilfe die entsprechenden neuen Eigenschaften der CPU-Architektur teilweise erklärt werden könnten. Die daraus gewonnenen Erkenntnisse sollen helfen, ein Modell zu entwickeln, das sowohl die Rechen- als auch die elektrische Leistung beschreibt. In Kapitel 6 beschreibe ich die Problemstellung der Energieeffizienz eines Hochleistungsrechners. Unter anderem werden die in dieser Arbeit entwickelten Methoden auf eine HPC-Platform evaluiert.Item Open Access Increased flexibility and dynamics in distributed applications and processes through resource decoupling(2014) Kipp, Alexander; Resch, Michael (Prof. Dr.-Ing.)Continuously increasing complexity of products and services requires more and more specialised expertise as well as relevant support by specialised IT tools and services. However, these services require expert knowledge as well, particularly in order to apply and use these services and tools in an efficient and optimal way. To this end, this thesis introduces a new virtualisation approach, allowing for both, the transparent integration of services in abstract process description languages, as well as the role based integration of human experts in this processes. The developed concept of this thesis has been realised by: - Enhancing the concept of web services with a service virtualisation layer, allowing for the transparent usage, adaptation and orchestration of services - Enhancing the developed concept towards a “Dynamic Session Management” environment, enabling the transparent and role-based integration of human experts following the SOA paradigm - Developing a collaboration schema, allowing for setting up and steering synchronous collaboration sessions between human experts. This enhancement also considers the respective user context and provides the best suitable IT based tooling support. The developed concept has been applied to scientific and economic application fields with a respective reference realisation.Item Open Access A light weighted semi-automatically I/O-tuning solution for engineering applications(Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2017) Wang, Xuan; Resch, Michael M. (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)Today’s engineering applications running on high performance computing (HPC) platforms generate more and more diverse data simultaneously and require large storage systems as well as extremely high data transfer rates to store their data. To achieve high performance data transfer rate (I/O performance), computer scientists together with HPC manufacturers have developed a lot of innovative solutions. However, how to transfer the knowledge of their solutions to engineers and scientists has become one of the largest barriers. Since the engineers and scientists are experts in their own professional areas, they might not be capable of tuning their applications to the optimal level. Sometimes they might even drop down the I/O performance by mistake. The basic training courses provided by computing centers like HLRS seem to be not sufficient enough to transfer the know-how required. In order to overcome this barrier, I have developed a semi-automatically I/O-tuning solution (SAIO) for engineering applications. SAIO, a light weighted and intelligent framework, is designed to be compatible with as many engineering applications as possible, scalable with large engineering applications, usable for engineers and scientists with little knowledge of parallel I/O, and portable across multiple HPC platforms. Standing upon MPI-IO library allows SAIO to be compatible with MPI-IO based high level I/O libraries, such as parallel HDF5, parallel NetCDF, as well as proprietary and open source software, like Ansys Fluent, WRF Model etc. In addition, SAIO follows current MPI standard, which makes it be portable across many HPC platforms and scalable. SAIO, which is implemented as dynamic library and loaded dynamically, does not require recompiling or changing application's source codes. By simply adding several export directives into their job submission scripts, engineers and scientists will be able to run their jobs more efficiently. Furthermore, an automated SAIO training utility keeps the optimal configurations up to date, without any manuell efforts of user involved.Item Open Access Optimizing I/O performance with machine learning supported auto-tuning(Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2023) Bağbaba, Ayşe; Resch, Michael M. (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)Data access is a considerable challenge because of the scalability limitation of I/O. In addition, some applications spend most of their total execution times in I/O. This causes a massive slowdown and wastage of useful computing resources. Unfortunately, there is not any one-size-fits-all solution to the I/O problems, so I/O becomes a limiting factor for such applications. Parallel I/O is an essential technique for scientific applications running on high-performance computing systems. Typically, parallel I/O stacks offer many parameters that need to be tuned to achieve an I/O performance as good as possible. Unfortunately, there is no default best configuration of these parameters; in practice, these differ not only between systems but often also from one application use case to the other. However, scientific users might not have the time or the experience to explore the parameter space sensibly and choose a proper configuration for each application use case. I present a line of solutions to this problem containing a machine learning supported auto-tuning system which uses performance modelling to optimize I/O performance. I demonstrate the value of these solutions across applications and at scale.Item Open Access Model-centric task debugging at scale(Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2017) Nachtmann, Mathias; Resch, Michael (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)Chapter 1, Introduction, presents state of the art debugging techniques in high-performance computing. The lack of information out of the programming model, these traditional debugging tools suffer, motivated the model-centric debugging approach. Chapter 2, Technical Background: Parallel Programming Models & Tools, exemplifies the programming models used in the scope of my work. The differences between those models are illustrated, and for the most popular programming models in HPC, examples are attached in this chapter. The chapter also describes Temanejo, the toolchain's front-end, which supports the application developer during his actions. In the following chapter (Chapter 4), Design: Events & Requests in Ayudame, the theory of task" and dependency" representation is stated. The chapter includes the design of different information types, which are later on used for the communication between a programming model and the model-centric debugging approach. In chapter 5, Design: Communication Back-end Ayudame, the design of the back-end tool infrastructure is described in detail. This also includes the problems occurring during the design process and their specific solutions. The concept of a multi-process environment and the usage of different programming models at the same time is also part of this chapter. The following chapter (Chapter 6), Instrumentation of Runtime Systems, briefly describes the information exchange between a programming model and the model-centric debugging approach. The different ways of monitoring and controlling an application through its programming model are illustrated. In chapter 7, Case Study: Performance Debugging, the model-centric debugging approach is used for optimising an application. All necessary optimisation steps are described in detail, with the help of mock-ups. Additionally, a description of the different optimised versions is included in this chapter. The evaluation, done on different hardware architectures, is presented and discussed. This includes not only the behaviour of the versions on different platforms but also architecture specific issues.Item Open Access Simulationsgestützte Absicherung von Fahrerassistenzsystemen(2018) Feilhauer, Marius; Resch, Michael M. (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)Item Open Access Service level agreements for job submission and scheduling in high performance computing(2014) Kübert, Roland; Resch, Michael (Prof. Dr.-Ing.)This thesis introduces the concept of long-term service level agreements for the offering of quality of service in high performance computing. Feasiblity of the approach is demonstrated by a proof of concept implementation. A simulation tool developed in the scope of this thesis is subsequently used to investigate sensible parameters for quality of service classes in the high performance computing domain.Item Open Access Communication methods for hierarchical global address models in HPC(Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2016) Zhou, Huan; Resch, Michael (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)Item Open Access Ein gebrauchstaugliches Augmented Reality-System für geometrische Analysen in der Produktentstehung(2019) Bliese, Björn; Resch, Michael M. (Prof. Dr.-Ing.)Augmented Reality beschreibt die Erweiterung der menschlichen Wahrnehmung der Realität durch virtuelle Zusatzinformationen. Die Technologie kann in der Produktentstehung für die gemeinsame geometrische Analyse physischer und virtueller Modelle eingesetzt werden. Sie verspricht damit großes Potenzial für eine engere Verzahnung physischer und virtueller Entwicklungsaktivitäten. Trotzdem werden Augmented Reality-Systeme heute kaum produktiv in der Produktentstehung eingesetzt. Eine genauere Betrachtung der bekannten Anwendungen bestätigt die Potenziale von Augmented Reality für geometrische Analysen in Bezug auf Qualität, Kosten und Zeit der Produktentstehung. Für die Durchführung von Augmented Reality-Analysen sind in der Literatur verschiedene Systemlösungen beschrieben, welche sich hauptsächlich durch die eingesetzten Komponenten zur Lagebestimmung unterscheiden. Die einzelnen Augmented Reality-Systemkomponenten scheinen für sich betrachtet zwar technisch weitestgehend ausgereift, die Systeme bieten allerdings selten einen für den Anwender durchgängigen Gesamtprozess. Die Bedienung ist in der Regel sehr komplex und erfordert viel spezifisches Fachwissen. Um Augmented Reality für geometrische Analysen in der Produktentstehung umfassend nutzbar zu machen, wird die Gebrauchstauglichkeit in den Mittelpunkt der folgenden Systementwicklung gestellt. Auf Basis einer ausführlichen Beschreibung des Nutzungskontextes werden die grundlegenden Anforderungen an ein gebrauchstaugliches Augmented Reality-System formuliert. Die möglichen Systemkomponenten werden bezüglich dieser Anforderungen bewertet und ausgewählt, die zugehörigen Prozesse werden aus Benutzersicht ausgestaltet und im Sinne einer möglichst hohen Gebrauchstauglichkeit strukturiert. Mit diesem Wissen wird eine Benutzerführung entwickelt, welche den Anwender durch den gesamten Untersuchungsprozess führt. Die Gebrauchstauglichkeit des so entwickelten Augmented Reality-Systems wird schließlich im Rahmen einer Probandenstudie evaluiert.