Browsing by Author "Egger, Simon"

Now showing 1 - 2 of 2

Open Access
Adaptive robust scheduling in wireless Time-Sensitive Networks (TSN)
(2024) Egger, Simon
The correct operation of upper-layer services is unattainable in wireless Time-Sensitive Networks (TSN) if the schedule cannot provide formal reliability guarantees to each stream. Still, current TSN scheduling literature leaves reliability, let alone provable reliability, either poorly quantified or entirely unaddressed. This work aims to remedy this shortcoming by designing an adaptive mechanism to compute robust schedules. For static wireless channels, robust schedules enforce the streams' reliability requirements by allocating sufficiently large wireless transmission intervals and by isolating omission faults. While robustness against omission faults is conventionally achieved by strictly isolating each transmission, we show that controlled interleaving of wireless streams is crucial for finding eligible schedules. We adapt the Disjunctive Graph Model (DGM) from job-shop scheduling to design TSN-DGM as a metaheuristic scheduler that can schedule up to one hundred wireless streams with fifty cross-traffic streams in under five minutes. In comparison, we demonstrate that strict transmission isolation already prohibits scheduling a few wireless streams. For dynamic wireless channels, we introduce shuffle graphs as a linear-time adaptation strategy that converts reliability surpluses from improving wireless links into slack and reliability impairments from degrading wireless links into tardiness. While TSN-DGM is able to improve the adapted schedule considerably within ten seconds of reactive rescheduling, we justify that the reliability contracts between upper-layer services and the infrastructure provider should specify a worst-case channel degradation beyond which no punctuality guarantees can be made.
Open Access
Distributed fast fourier transform for heterogeneous GPU systems
(2021) Egger, Simon
The Fast Fourier Transform (FFT) is a numerical method to convert the input data to a representation in the frequency domain. A wide range of applications requires the computation of three-dimensional FFTs, which makes the utilization of Graphics Processing Units (GPUs) on distributed systems particularly appealing. The most common approach for distributed computation is to partition the global input data, resulting in slab decomposition or pencil decomposition. For large numbers of processes, it is well known that slab decomposition only provides limited scalability and is generally outperformed by pencil decomposition. This often leaves their performance comparison on fewer GPUs as a blind spot: We found that slab decomposition generally dominates for larger input sizes when utilizing fewer GPUs, which is compliant with simple theoretical models. An exception to this rule is when the processor grid of pencil decomposition is specifically aligned to fully utilize available NVLink interconnections. Next to the default implementation of slab decomposition and pencil decomposition, we propose Realigned as a possible optimization for both decomposition methods by taking advantage of cuFFT's advanced data layout. Most notably, Realigned reduced the additional memory requirements of pencil decomposition and computes the 1D-FFTs in y-direction more efficiently. Since both decomposition methods require global redistribution of the intermediate results, we further compare the performance of different Peer2Peer and All2All communication techniques. In particular, we introduce Peer2Peer-Streams, which avoids the need for additional synchronization and allows the complete overlap of communication and packing phase. Our performance benchmarks show that this approach generally performs best for large input sizes on test systems with a limited number of GPUs when considering MPI without CUDA-awareness. Furthermore, we utilize custom MPI datatypes and adopt MPI_Type for GPUs, which reduces the additional memory requirements dramatically and avoids the need for a packing and unpacking phase altogether. By identifying a redistributed partition as a batch of slices, where each slice consists of the maximum number of contiguous, complex-valued words, we found that MPI_Type often poses a worthwhile consideration when both sent and received partitions are not composed of one-dimensional slices.