Efficient application of accelerator cards for the coupling library preCICE

Schrader, Timo Pierre

Efficient application of accelerator cards for the coupling library preCICE

Files

Masterarbeit_Schrader_signed.pdf (6.72 MB)

Date

2023

Authors

Schrader, Timo Pierre

Abstract

The usage of accelerator cards, mainly graphics processing units (GPU), in scientific and industrial research has been on the rise for years due to their highly data-parallel computational throughput capabilities. Common fields are, amongst other things, machine learning, computational physics, and cryptography. This thesis investigates the efficient application of GPUs in the multi-physics coupling library preCICE. We look at data mapping methods, which are used to map values between two vertex clouds. More specifically, the focus lies on radial basis function (RBF) interpolation that acts on scattered data points. Solving an RBF interpolation problem requires the solution of mostly large and ill-conditioned systems of linear equations. High computational effort is needed in order to solve these systems, which increases the runtime of preCICE tremendously. We approach this problem by leveraging the high computing power of GPUs. In order to integrate GPU support into preCICE, we make use of the Ginkgo linear algebra library, which supports multiple data-parallel backends, including Nvidia CUDA, AMD HIP, and OpenMP. It provides solvers and preconditioners for linear systems of equations such as conjugate gradient (CG) and GMRES. Using Ginkgo, we implement an assembly routine for RBF matrices that is up to 100-1,000 times faster than already existing variants in preCICE. We discuss GPU-specific optimization approaches and the resulting efficiency of our implementation approach. The result is a nearly optimal assembly kernel that uses most of the 64-bit compute units on the GPU. Next, we evaluate CG and GMRES, combined with Jacobi and Cholesky preconditioners, on GPUs. The iterative solution approach works well on sparse system matrices, which are the result of RBF kernels with local support and are very competitive to using a very high number of CPU cores. To also provide an efficient way of solving dense systems on GPUs, we additionally implement a QR decomposition using the Nvidia cuSolver library. Our experiments show that using the CUDA QR decomposition on dense system matrices outperforms every other variant including iterative GPU and multi-core CPU solvers as well as single-core solvers by at least a factor of five for larger interpolation problems. As a last step, we investigate a matrix-free RBF solution approach that allows for solving problems of sizes that exceed GPU memory limitations in matrix-based methods. To summarize our findings, preCICE can highly benefit from the efficient application of GPUs in RBF data mapping routines by being able to solve large interpolation problems much faster; enabling the users of preCICE to run their coupled simulation in less time.

Heutzutage sind Grafikprozessoren (GPU) aufgrund ihrer außerordentlichen Rechenleistung in der industriellen und akademischen Forschung nicht mehr wegzudenken. Rechenintensive Disziplinen wie maschinelles Lernen, Computersimulationen und Kryptographie profitieren hierbei von der sehr hohen Datenparallelität, die GPUs aufweisen. Die vorliegende Thesis untersucht, wie Grafikprozessoren effizient in der preCICE Software genutzt werden können. Die preCICE Bibliothek ermöglicht gekoppelte Multiphysik-Simulationen, indem sie mehrere numerische Löser für verschiedene Aspekte der Simulation koppelt und somit deren Verhalten synchronisieren und Datenaustausch unterstützen kann. Unser Fokus liegt hierbei auf dem Datenmapping in preCICE, welches benötigt wird, wenn berechnete Werte zwischen zwei unterschiedlichen Gittern ausgetauscht werden müssen. Speziell fokussieren wir uns auf Interpolation mit radialen Basisfunktionen (RBF). Dieses Verfahren erfordert das Lösen eines linearen Gleichungssystems, welches meistens sehr groß und schlecht konditioniert ist. Da diese Eigenschaft einen großen Einfluss auf die Laufzeit von preCICE hat, wollen wir untersuchen, ob Interpolation mit RBFs durch GPUs beschleunigt werden kann. Wir nutzen die lineare Algebra Bibliothek Ginkgo, um GPU Unterstützung in preCICE zu realisieren. Ginkgo bietet für verschiedene Löser und Vorkonditionierer Implementierungen an, die auf Nvidia und AMD GPUs lauffähig sind und ebenfalls mittels OpenMP parallelisiert werden können. Die Matrixassemblierung auf der GPU, welche wir mit der Hilfe von Ginkgo umsetzen, ist um Faktor 100-1.000 schneller als bisherige Varianten in preCICE. Des Weiteren evaluieren wir, wie das Verfahren der konjugierten Gradienten (CG) und das GMRES-Verfahren inklusive Jacobi- und Cholesky-Vorkonditionierer auf GPUs das Datenmapping beschleunigen können. Unsere Experimente zeigen, dass diese Verfahren besonders auf dünn-besetzten Systemen gut funktionieren und mit hoch parallelisierten CPU Implementierungen mithalten können. Darüber hinaus implementieren wir eine QR Zerlegung mittels der cuSolver Bibliothek. Auf dicht-besetzten Matrizen ist diese Implementierung zeitlich um mindestens einen Faktor fünf den anderen Implementierungen überlegen. Abschließend evaluieren wir eine matrixfreie Implementierung, welche es erlaubt, besonders große Interpolationsprobleme auf der GPU zu lösen, die ansonsten nicht in den verfügbaren Speicher der GPUs passen würden. Das Fazit dieser Arbeit ist, dass preCICE stark von Grafikprozessoren im Datenmapping in Bezug auf die Laufzeit profitieren kann. Speziell die großen Interpolationsprobleme können hierbei deutlich schneller auf GPUs gelöst werden als auf CPUs.

URI

http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-130460
http://elib.uni-stuttgart.de/handle/11682/13046
http://dx.doi.org/10.18419/opus-13027

Collections

05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Full item page

Efficient application of accelerator cards for the coupling library preCICE

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By