Dynamic energy optimization for GPUs using iteration and block detection on non-invasive metrics

Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

GPUs have become the accelerator of choice in the fields of High Performance Computing and Artificial Intelligence as many workloads there benefit from the highly parallel architecture. Supercomputing sites however not only have to provide exascale performance, they also need to deal with the increasing energy consumption of such systems. High energy prices, government carbon goals, or even constraints from the energy provider require the available energy to be used efficiently, for supercomputers to stay ecologically and financially sustainable. A common approach is to use application-aware Dynamic Voltage and Frequency Scaling, such that lower clock frequencies with lower energy consumption are used for applications that do not benefit much from the additional performance of higher frequencies. However, most of the existing solutions use detailed profiling counters, that require invasive workarounds to be obtained, and machine learning models to characterize the current application, which need to be trained extensively for every new system. Another promising approach makes use of the fact that most HPC and AI applications have repeating code sections. To detect this periodicity, current approaches use frequency analysis on non-invasive data, like utilization rates, or code instrumentation. By strategically trying out different frequencies and observing the runtime of the individual iterations, the system is steered towards the most energy efficient setting. This thesis proposes a novel approach for periodicity and iteration detection based on Dynamic Time Warping [1]. The WARP algorithm, proposed in [2], is adapted to work on real-valued time-series data and an algorithm is developed based on DTW, that detects each following iteration. Together, this system can detect the presence of periodicity and the start of each individual iteration of any real-valued time-series – in real-time and with low-latency. It is resilient to changes in y-shift or iteration length over time, but still recognizes when the periodicity is no longer present. Results show that delimiting each iteration is accurate to a single measurement most of the time. This thesis also proposes an energy optimization framework, called Warp-EO, which uses the newly developed system to optimize the energy efficiency of GPUs based on a state-feedback controller. For the Himeno [3] benchmark, it can reduce the Energy Delay Product up to 10.96 % by saving 10.20 % energy with only a 0.74 % increase in runtime. However, NVML is found to be unsuitable to provide data that clearly shows the individual iterations of many ECP proxy apps [4].

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By