Browsing by Author "Wunderlich, Hans-Joachim (Prof. Dr. rer. nat. habil.)"

Now showing 1 - 5 of 5

Open Access
Algorithm-based fault tolerance for matrix operations on graphics processing units : analysis and extension to autonomous operation
(2015) Braun, Claus; Wunderlich, Hans-Joachim (Prof. Dr. rer. nat. habil.)
Scientific computing and computer-based simulation technology evolved to indispensable tools that enable solutions for major challenges in science and engineering. Applications in these domains are often dominated by compute-intensive mathematical tasks like linear algebra matrix operations. The provision of correct and trustworthy computational results is an essential prerequisite since these applications can have direct impact on scientific, economic or political processes and decisions. Graphics processing units (GPUs) are highly parallel many-core processor architectures that deliver tremendous floating-point compute performance at very low cost. This makes them particularly interesting for the substantial acceleration of complex applications in science and engineering. However, like most nano-scaled CMOS devices, GPUs are facing a growing number of threats that jeopardize their reliability. This makes the integration of fault tolerance measures mandatory. Algorithm-Based Fault Tolerance (ABFT) allows the protection of essential mathematical operations, which are intensively used in scientific computing. It provides a high error coverage combined with a low computational overhead. However, the integration of ABFT into linear algebra matrix operations on GPUs is a non-trivial task, which requires a thorough balance between fault tolerance, architectural constraints and performance. Moreover, ABFT for operations carried out in floating-point arithmetic has to cope with a reduced error detection and localization efficacy due to inevitable rounding errors. This work provides an in-depth analysis of Algorithm-Based Fault Tolerance for matrix operations on graphics processing units with respect to different types and combinations of weighted checksum codes, partitioned encoding schemes and architecture-related execution parameters. Moreover, a novel approach called A-ABFT is introduced for the efficient online determination of rounding error bounds, which improves the error detection and localization capabilities of ABFT significantly. Extensive experimental evaluations of the error detection capabilities, the quality of the determined rounding error bounds, as well as the achievable performance confirm that the proposed A-ABFT method performs better than previous approaches. In addition, two case studies (QR decomposition and Linear Programming) emphasize the efficacy of A-ABFT and its applicability to practical problems.
Open Access
Fault tolerance infrastructure and its reuse for offline testing : synergies of a unified architecture to cope with soft errors and hard faults
(2015) Imhof, Michael E.; Wunderlich, Hans-Joachim (Prof. Dr. rer. nat. habil.)
The evolution of digital circuits from a few application areas to omnipresence in everyday life has been enabled by the ability to dramatically increase integration density through scaling. However, the continuation of scaling gets more difficult with every generation and poses severe challenges on reliability. Throughout the manufacturing process the appearance of defects cannot be avoided and further deteriorates with scaling. Hence, the reliability at timepoint zero denoted by the manufacturing yield is not ideal and some defective chips will produce wrong output signals. For this reason, the presence of such hard faults needs to be shown prior to delivery during test where automatic test equipment (ATE) is used to apply a test set that covers a predefined set of modeled defects. As some potential defect locations are hard to test using the chips operational interface, additional dedicated test infrastructure is included on chip that provides test access. Throughout the operational lifetime reliability is threatened by soft errors that originate from interactions of radiation with semiconductor devices and potentially manifest in sequential state corruptions. With further raising soft error rates aggravated by scaling high reliability is maintained by the inclusion of fault tolerance infrastructure able to detect, localize and ideally correct soft errors. Thus, the orthogonal combination of two independent infrastructures elevates the area overhead although test support and fault tolerance are never required concurrently. This work proposes a unified architecture that employs a common infrastructure to provide fault tolerance during operation and test access during test. Similarities between both fields are successfully exploited and traced back to the combination of an efficient sequential state checksum with an effective state update by bit-flipping. Experiments on public and industrial circuits evaluate the unified architecture in both fields and show an improved area efficiency as well as successful correction during fault tolerance. During test, the results substantiate advantages with respect to test time, test volume, peak and average test power as well as test energy.
Open Access
Multi-level simulation of nano-electronic digital circuits on GPUs
(2019) Schneider, Eric; Wunderlich, Hans-Joachim (Prof. Dr. rer. nat. habil.)
Simulation of circuits and faults is an essential part in design and test validation tasks of contemporary nano-electronic digital integrated CMOS circuits. Shrinking technology processes with smaller feature sizes and strict performance and reliability requirements demand not only detailed validation of the functional properties of a design, but also accurate validation of non-functional aspects including the timing behavior. However, due to the rising complexity of the circuit behavior and the steady growth of the designs with respect to the transistor count, timing-accurate simulation of current designs requires a lot of computational effort which can only be handled by proper abstraction and a high degree of parallelization. This work presents a simulation model for scalable and accurate timing simulation of digital circuits on data-parallel graphics processing unit (GPU) accelerators. By providing compact modeling and data-structures as well as through exploiting multiple dimensions of parallelism, the simulation model enables not only fast and timing-accurate simulation at logic level, but also massively-parallel simulation with switch level accuracy. The model facilitates extensions for fast and efficient fault simulation of small delay faults at logic level, as well as first-order parametric and parasitic faults at switch level. With the parallelization on GPUs, detailed and scalable simulation is enabled that is applicable even to multi-million gate designs. This way, comprehensive analyses of realistic timing-related faults in presence of process- and parameter variations are enabled for the first time. Additional simulation efficiency is achieved by merging the presented methods in a unified simulation model, that allows to combine the unique advantages of the different levels of abstraction in a mixed-abstraction multi-level simulation flow to reach even higher speedups. Experimental results show that the implemented parallel approach achieves unprecedented simulation throughput as well as high speedup compared to conventional timing simulators. The underlying model scales for multi-million gate designs and gives detailed insights into the timing behavior of digital CMOS circuits, thereby enabling large-scale applications to aid even highly complex design and test validation tasks.
Open Access
Reconfigurable scan networks : formal verification, access optimization, and protection
(2014) Baranowski, Rafal; Wunderlich, Hans-Joachim (Prof. Dr. rer. nat. habil.)
To facilitate smooth VLSI development and improve chip dependability, VLSI designs incorporate instrumentation for post-silicon validation and debug, volume test and diagnosis, as well as in-field system maintenance. Examples of on-chip instruments include embedded logic analyzers, trace buffers, test and debug controllers, assertion checkers, and physical sensors, to name just a few. Since the amount of embedded instrumentation in system-on-a-chip designs increases at an exponential rate, scalable mechanisms for instrument access become indispensable. Reconfigurable scan architectures emerge as a suitable mechanism for access to on-chip instruments. Such structures integrate embedded instrumentation into a common scan network together with configuration registers that determine how data are transported through the network. For test purposes, the design of regular reconfigurable scan networks is covered by IEEE Std. 1149.1-2013 (Joint Test Action Group, JTAG) and IEEE Std. 1500 (Standard for Embedded Core Test, SECT). For general-purpose instrumentation, the ongoing standardization effort IEEE P1687 (Internal JTAG, IJTAG) allows user-defined scan architectures with arbitrary access control. The flexibility of reconfigurable scan networks poses a serious challenge: The deep sequential behavior, limited serial interface, and complex access dependencies are beyond the capabilities of state-of-the-art verification methods. This thesis contributes a novel modeling method for formal verification of reconfigurable scan architectures. The proposed model is based on a temporal abstraction which is both sound and complete for a wide array of scan networks. Experimental results show that this abstraction improves the scalability of model checking algorithms tremendously. The access to instruments in complex reconfigurable scan networks requires specialized algorithms for pattern generation. This problem is addressed with formal techniques that leverage the temporal abstraction to generate valid access patterns with low access time. This work presents the first method applicable to pattern retargeting and access merging in complex reconfigurable architectures compliant with IEEE Std. P1687. Embedded instrumentation is an integral system component that remains functional throughout the lifetime of a chip. To prevent harmful activities, such as tampering with safety-critical systems, and reduce the risk of intellectual property infringement, the access to embedded instrumentation requires protection. This thesis provides a novel, scalable protection for general reconfigurable scan networks. The proposed method allows fine-grained control over the access to individual instruments at low hardware cost and without the need to redesign the scan architecture.
Open Access
Test planning for low-power built-in self test
(2014) Zoellin, Christian G.; Wunderlich, Hans-Joachim (Prof. Dr. rer. nat. habil.)
Power consumption has become the most important issue in the design of integrated circuits. The power consumption during manufacturing or in-system test of a circuit can significantly exceed the power consumption during functional operation. The excessive power can lead to false test fails or can result in the permanent degradation or destruction of the device under test. Both effects can significantly impact the cost of manufacturing integrated circuits. This work targets power consumption during Built-In Self-Test (BIST). BIST is a Design-for-Test (DfT) technique that adds additional circuitry to a design such that it can be tested at-speed with very little external stimulus. Test planning is the process of computing configurations of the BIST-based tests that optimize the power consumption within the constraints of test time and fault coverage. In this work, a test planning approach is presented that targets the Self-Test Using Multiple-input signature register and Parallel Shift-register sequence generator (STUMPS) DfT architecture. For this purpose, the STUMPS architecture is extended by clock gating in order to leverage the benefits of test planning. The clock of every chain of scan flip-flops can be independently disabled, reducing the switching activity of the flip-flops and their clock distribution to zero as well as reducing the switching activity of the down-stream logic. Further improvements are obtained by clustering the flip-flops of the circuit appropriately. The test planning problem is mapped to a set covering problem. The constraints for the set covering are extracted from fault simulation and the circuit structure such that any valid cover will test every targeted fault at least once. Divide-and-conquer is employed to reduce the computational complexity of optimization against a power consumption metric. The approach can be combined with any fault model and in this work, stuck-at and transition faults are considered. The approach effectively reduces the test power without increasing the test time or reducing the fault coverage. It has proven effective with academic benchmark circuits, several industrial benchmarks and the Synergistic Processing Element (SPE) of the Cell/B.E.™ Processor (Riley et al., 2005). Hardware experiments have been conducted based on the manufacturing BIST of the Cell/B.E.™ Processor and shown the viability of the approach for industrial, high-volume, high-end designs. In order to improve the fault coverage for delay faults, high-frequency circuits are sometimes tested with complex clock sequences that generate test with three or more at-speed cycles (rather than just two of traditional at-speed testing). In order to allow such complex clock sequences to be supported, the test planning presented here has been extended by a circuit graph based approach for determining equivalent combinational circuits for the sequential logic. In addition, this work proposes a method based on dynamic frequency scaling of the shift clock that utilizes a given power envelope to it full extent. This way, the test time can be reduced significantly, in particular if high test coverage is targeted.