05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 7 of 7
  • Thumbnail Image
    ItemOpen Access
    Stochastic neural networks : components, analysis, limitations
    (2022) Neugebauer, Florian; Polian, Ilia (Prof. Dr.)
    Stochastic computing (SC) promises an area and power-efficient alternative to conventional binary implementations of many important arithmetic functions. SC achieves this by employing a stream-based number format called Stochastic numbers (SNs), which enables bit-sequential computations, in contrast to conventional binary computations that are performed on entire words at once. An SN encodes a value probabilistically with equal weight for every bit in the stream. This encoding results in approximate computations, causing a trade-off between power consumption, area and computation accuracy. The prime example for efficient computation in SC is multiplication, which can be performed with only a single gate. SC is therefore an attractive alternative to conventional binary implementations in applications that contain a large number of basic arithmetic operations and are able to tolerate the approximate nature of SC. The most widely considered class of applications in this regard is neural networks (NNs), with convolutional neural networks (CNNs) as the prime target for SC. In recent years, steady advances have been made in the implementation of SC-based CNNs (SCNNs). At the same time however, a number of challenges have been identified as well: SCNNs need to handle large amounts of data, which has to be converted from conventional binary format into SNs. This conversion is hardware intensive and takes up a significant portion of a stochastic circuit's area, especially if the SNs have to be generated independently of each other. Furthermore, some commonly used functions in CNNs, such as max-pooling, have no exact corresponding SC implementation, which reduces the accuracy of SCNNs. The first part of this work proposes solutions to these challenges by introducing new stochastic components: A new stochastic number generator (SNG) that is able to generate a large number of SNs at the same time and a stochastic maximum circuit that enables an accurate implementation of max-pooling operations in SCNNs. In addition, the first part of this work presents a detailed investigation of the behaviour of an SCNN and its components under timing errors. The error tolerance of SC is often quoted as one of its advantages, stemming from the fact that any single bit of an SN contributes only very little to its value. In contrast, bits in conventional binary formats have different weights and can contribute as much as 50\% of a number's value. SC is therefore a candidate for extreme low-power systems, as it could potentially tolerate timing errors that appear in such environments. While the error tolerance of SC image processing systems has been demonstrated before, a detailed investigation into SCNNs in this regard has been missing so far. It will be shown that SC is not error tolerant in general, but rather that SC components behave differently even if they implement the same function, and that error tolerance of an SC system further depends on the error model. In the second part of this work, a theoretical analysis into the accuracy and limitations of SC systems is presented. An existing framework to analyse and manage the accuracy of combinational stochastic circuits is extended to cover sequential circuits. This framework enables a designer to predict the effect of small design changes on the accuracy of a circuit and determine important parameters such as SN length without extensive simulations. It will further be shown that the functions that are possible to implement in SC are limited. Due to the probabilistic nature of SC, some arithmetic functions suffer from a small bias when implemented as a stochastic circuit, including the max-pooling function in SCNNs.
  • Thumbnail Image
    ItemOpen Access
    Scatter and beam hardening correction for high-resolution CT in near real-time based on a fast Monte Carlo photon transport model
    (2022) Alsaffar, Ammar; Simon, Sven (Prof. Dr.-Ing.)
    Computed tomography (CT) is a powerful non-destructive testing (NDT) technique. It provides inception about the inner of the scanned object and is widely used for industrial and medical applications. However, this technique suffers from severe quality degradation artifacts. Among these artifacts, the scatter and the beam hardening (BH) causes severe quality degradation of the reconstructed CT images. The scatter results from the change in the direction, or the direction and the energy of the photon penetrating the object, while the beam hardening results from the polychromatic nature of the X-ray source. When photons of different energies penetrate through the object, low-energy photons are more easily absorbed than high-energy photons. This results in the hardening of the X-ray beam which causes the non-linear relation between the propagation path length and the attenuation of the beam. These kinds of artifacts are the major source of the cupping and the streak artifacts that highly degrades the quality of the computed tomography imaging. The presence of the cupping and the streak artifacts reduce the contrast of this image and the contrast-to-noise and cause distortion of the grey values. As a consequence important analysis of the results from the computed tomography technique is affected, e.g., the detectability of voids and cracks is reduced by the reduction of the contrast and affects the dimensional measurement. Monte Carlo (MC) simulation is considered the most accurate approach for scatter estimation. However, the existing MC estimators are computationally expensive, especially for the considered high-resolution flat-panel CT. In this work, a muli-GPU photon forward projection model and an iterative scatter correction algorithm were implemented. The Monte Carlo model has been highly accelerated and extensively verified using several experimental and simulated examples. The implemented model describes the physics within the 1 keV to 1 MeV range using multiple controllable key parameters. Based on this model, scatter computation for a single projection can be completed within a range of a few seconds under well-defined model parameters. Smoothing and interpolation are performed on the estimated scatter to accelerate the scatter calculation without compromising accuracy too much compared to measured near scatter-free projection images. Combining the scatter estimation with the filtered backprojection (FBP), scatter correction is performed effectively in an iterative manner. In order to evaluate the proposed MC model, extensive experiments have been conducted on the simulated data and real-world high-resolution flat-panel CT. Compared to the state-of-the-art MC simulators, the proposed MC model achieved a 15× acceleration on a single GPU in comparison to the GPU implementation of the Penelope simulator (MCGPU) utilizing several acceleration techniques, and a 202× speed-up on a multi-GPU system comparing to the multi-threaded state-of-the-art EGSnrc MC simulator. Furthermore, it is shown that for high-resolution images, scatter correction with sufficient accuracy is accomplished within one to three iterations using a FBP and the proposed fast MC photon transport model. Moreover, a fast and accurate BH correction method that requires no prior knowledge of the materials and corrects first and higher-order BH artifacts has been implemented. In the first step, a wide sweep of the material is performed based on an experimentally measured look-up table to obtain the closest estimate of the material. Then the non-linearity effect of the BH is corrected by adding the difference between the estimated monochromatic and the polychromatic simulated projections of the segmented image. The estimated monochromatic projection is simulated by selecting the energy from the polychromatic spectrum which produces the lowest mean square error (MSE) with the BH-corrupted projection from the scanner. While the polychromatic projection is accurately estimated using the least square estimation (LSE) method by minimizing the difference between the experimental projection and the linear combination of simulated polychromatic projections using different spectra of different filtration. As a result, an accurate non-linearity correction term is derived that leads to an accurate BH correction result. To evaluate the proposed BH correction method, extensive experiments have been conducted on real-world CT data. Compared to the state-of-the-art empirical BH correction method, the experiments show that the proposed method can highly reduce the BH artifacts without prior knowledge of the materials. In summary, the lack of the availability of fast and computationally efficient methods to correct the major artifacts in CT images, i.e., scatter and beam hardening, has motivated this work in which efficient and fast algorithms have been implemented to correct these artifacts. The correction of these artifacts has led to better visualization of the CT images, a higher contrast-to-noise ratio, and improved contrast. Supported by multiple experimental examples, it is shown that the scatter corrected images, using the proposed method, resample the near artifacts-free reference images acquired experimentally within a reasonable time. On the other hand, the application of the proposed BH correction method after the correction of the scatter artifacts results in the complete removal of the rest cupping and streak artifacts that were degrading the scatter-corrected images and improved the contrast-to-noise (CNR) ratio of the scatter-corrected images. Moreover, assessments of the correction quality of the CT images have been performed using the software Volume Graphics VGSTUDIO MAX. Better surface determination can be derived from the artifacts-corrected images. In addition, enhancing the contrast by correcting these artifacts results in an improved detectability of voids and cracks in several concrete examples. This supports the efficiency of the implemented artifacts correction methods in this work.
  • Thumbnail Image
    ItemOpen Access
    Performability analysis of Networks-on-Chips
    (2021) Hou, Jie; Radetzki, Martin (Prof. Dr.-Ing.)
    The rapidly increasing transistor density enables the evolution of many-core on-chip systems. Networks-on-Chips (NoCs) are the preferred communication infrastructure for such systems. Besides, NoCs have also been proposed to solve the complex on-chip communication problem in the three-dimensional systems-on-chips (3D SoCs). A downside of technology scaling is the increased susceptibility to failures in NoC resources. Ensuring reliable operation despite such failures degrades NoC performance and may even invalidate the performance benefits expected from scaling. Thus, it is not enough to analyze performance and reliability in isolation, as usually done. The goal of this thesis is to cope with the performance and reliability analysis of NoCs jointly under consideration of faults. This is achieved by the concept of the performability analysis. One of the commonly used performability methods is the Markov reward model. In this work, a generic methodology based on the Markov reward model is proposed to perform performability evaluation and analysis of NoCs under various design parameters. The introduced methodology consists of two parts. In the first part, generic Markov modeling of NoCs with consideration of different fault models is proposed. It can be applied for both 2D and 3D NoCs. As the size of NoCs increases, the size of their Markov state spaces grows as well. To perform the performability evaluation of large size NoCs, we implement tools to generate the Markov state space and to perform the long-term and transient analyses of the generated Markov model. In the second part, we introduce two performance metrics namely communication time and fault resilience. Communication time is an indicator of the ability to successfully transmit a certain number of packets. Fault resilience is an indicator of how reliable a NoC is in terms of connected paths. As the number of fault combinations of some states in the utilized Markov model is huge, it is a challenging and necessary research task to obtain performance metrics of valid states in the utilized Markov model efficiently. In this work, we present different approaches to computing the mentioned performance metrics. These approaches have been verified with simulations concerning accuracy and speedup. Finally, we utilize several case studies to demonstrate how to use our proposed methodology to evaluate and analyze the performability of NoCs. Performabilities of various topologies and routing algorithms are evaluated and compared with respect to two performance metrics. Moreover, we investigate how performability develops with scaling towards larger NoCs and explore the limits of scaling by determining the break-even failure rates under which scaling can achieve net performance increase.
  • Thumbnail Image
    ItemOpen Access
    Dependable reconfigurable scan networks
    (2022) Lylina, Natalia; Wunderlich, Hans-Joachim (Prof.)
    The dependability of modern devices is enhanced by integrating an extensive number of extra-functional instruments. These are needed to facilitate cost-efficient bring-up, debug, test, diagnosis, and adaptivity in the field and might include, e.g., sensors, aging monitors, Logic, and Memory Built-In Self-Test (BIST) registers. Reconfigurable Scan Networks (RSNs) provide a flexible way to access such instruments as well the device's registers throughout the lifetime, starting from post-silicon validation (PSV) through manufacturing test and finally during in-field operation. At the same time, the dependability properties of the system can be affected through an improper RSN integration. This doctoral project overcomes these problems and establishes a methodology to integrate dependable RSNs for a given system considering the most relevant dependability aspects, such as robustness, testability, and security compliance of RSNs.
  • Thumbnail Image
    ItemOpen Access
    Automatic methods for protection of cryptographic hardware against fault attacks
    (2022) Gay, Maël; Polian, Ilia (Prof. Dr. rer. nat. habil.)
    Since several years, the number of electronic devices in use has been strongly rising, especially in the field of embedded systems. From automotive applications or smartphones, to smaller area and power restricted embedded systems, such as Internet of Things (IoT) devices or smart cards, the wide availability of these systems induces a need for data protection. The implementation of hardware cryptographic primitives on Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) aims to fulfil the security requirements, while providing faster and lower power encryption than software based solutions on microprocessors, especially in the case of constrained resources. However, cryptographic solutions can be attacked, even if the encryption scheme is proven secure. One possible way to do so is through physical attacks, such as Side-Channel Analysis (SCA), for example by analysing their power consumption, or fault injection attacks, which disturb the computation in a way that allows an attacker to recover the secret key. As such, it is of the utmost relevance to implement cryptographic algorithms in a way that minimises the risk of physical attacks, as well as implement some counter-measures to prevent them, for instance Error Correcting Codes (ECC). Moreover, the evaluation of aforementioned cryptographic hardware and counter-measures is not generally done automatically, but rather empirically. This results in a need for the automation of both counter-measures generation and physical hardware checking against attacks. This thesis will focus on the automation of both aspects. Firstly, Error Detecting Code (EDC), as well as ECC, counter-measures are presented. Their goal is to stop faults from disturbing the encryption process. A discussion on the differences between natural (i.e induced by natural factors such as ageing or cosmic rays) and malicious faults is given in a subsequent chapter, as well as an analysis of the limitations of the evaluation of ECC. This is followed by the presentation of new architectures based on a new class of robust EDC, aimed at preventing multiple faults. They are scalable by construction, and as such it is possible to automatically choose an appropriate EDC implementation with regards to the constraint of the protected hardware. The architectures ensure the detection of faults injected by a strong adversary (who has the ability to inject precise faults on a temporal and spatial level), as well as the correction of low-multiplicity faults. The structure of the implementation, an inner-outer code based construction, and more specifically an efficient decoding method are further detailed, as well as some additional tweaks. Finally, the implementation is validated against physical fault injection on a SAKURA-G FPGA platform, and the results further reinforce the need for such architectures. The second part of the thesis will consider attack scenarios, and more precisely fault attacks. The automatic evaluation of hardware implementations of cryptographic primitives will be the main focus. In this regard, this thesis considers a particular type of fault attacks, hardware based Algebraic Fault Attacks (AFA). AFAs are at the border between mathematical cryptanalysis and physical fault injection attacks. They combine information from fault disturbed encryptions with some cipher description, in order to build an attack and recover the secret key. This work considers the hardware implementations of different ciphers as the source of algebraic information. In such regards, a framework for automated creation of AFAs has been developed in collaboration with the chair of computer architecture of the University of Freiburg. The framework takes the description of the cipher, in Hardware Description Language (HDL) or gate level, as well as a defined fault model as inputs, and through a series of steps, builds an attack in order to recovers the secret key. The detailed steps are presented in this thesis. The automatic generation of attack scenario for a considered cipher allows for an evaluation of any cipher implementation, including any potential changes or optimisation made against different attack scenarios. The framework itself was tested on a variety of different Substitution and Permutation Network (SPN), and some counter-measures. Physical realisation of fault attacks are also considered, from an implementation of the SAKURA-G FPGA platform, as well as software simulations of an idealised fault model. The constructed attacks were successful and the results are discussed, as well as the implication of multiple fault injections for solving. Finally, some counter-measures are considered, in order to validate or invalidate their effectiveness against AFAs.
  • Thumbnail Image
    ItemOpen Access
    Design-time system-on-chip memory optimization
    (2020) Strobel, Manuel; Radetzki, Martin (Prof. Dr.-Ing.)
    Trends as miniaturization and more data-intensive applications cause System-on-Chip (SoC) design to become increasingly complex and far from manageable without automated design methods. A central aspect in the corresponding field of Electronic Design Automation (EDA) is optimization, where especially the memory subsystem is of growing interest as the above trend allows to integrate more and more memory on-chip. Optimization potential of Static Random-Access Memory (SRAM), the most prominent storage technology for on-chip use, is twofold. On the one hand, the memory size is highly decisive. This is due to the fact that dynamic energy from read and write operations turns out to be consumed in the memory periphery to a large degree. As larger SRAM blocks logically require more switching logic in the periphery, using small SRAM resources for frequently used program code and data turns out to be highly energy-efficient. Steady reduction of the feature size in chip fabrication, on the other hand, leads to considerably increasing leakage currents and thus higher static power consumption. Saving potential in this regard is promised by the targeted activation of memory low-power modes. Spin-Transfer Torque Random-Access Memory (STT-RAM), an emerging memory technology, promises the same benefits as SRAM in terms of access performance at lower on-chip area footprint and without being volatile. Optimization potential in terms of energy consumption in this storage technology is particularly found in the costly write operation that allows for an energy/latency trade-off. This thesis contributes to the field of System-on-Chip design in general and to on-chip memory optimization in particular as follows. At large, a complete workflow for the application-specific optimization of memory subsystems at system design-time is proposed. This involves the automated and transparent connection of software simulation and memory access profiling, optimization of the memory subsystem, and finally the implementation of obtained results, ideally on the software-level. While minor contributions to Instruction Set Simulation (ISS) and code generation round off this workflow, main focus is on the optimization methods for SRAM- and STT-RAM-based memory subsystems that are at the core of this workflow. Inspired by embedded system synthesis theory, every proposed on-chip memory optimization method is categorized and formally defined as combination of memory allocation, application binding, and memory operation mode scheduling. Each mathematical problem formulation is further implemented by means of mixed-integer linear or quadratic programming, or using a heuristic. Following this uniform structure, this work introduces four different optimization concepts. This includes, at first, a method for dynamic energy minimization in SRAM-based memory subsystems through combined allocation and binding. Next, and with focus on static power consumption in SRAM, a combined solution for allocation, binding, and memory operation mode scheduling. Third, previous aspects are re-considered in the context of multi-core designs, i.e., targeting Multi-Processor System-on-Chip (MPSoC) design. The last optimization method deals with STT-RAM memories and presents a way for the combined determination of memory allocation and application binding while further exploiting the above mentioned trade-off. Thorough experimental evaluation proves the general functionality and scalability of all optimization methods. Beyond that, high application-specific saving potential can be reported. Concerning dynamic energy consumption, an optimized split memory configuration yields savings of partly over 90 % when compared to a baseline configuration with only one, typically large memory. In terms of static energy, percentage savings of over 60 % can be achieved in selected cases through the utilization of memory low-power modes. Additional impact through different write modes in STT-RAM instead turns out to be dominated by dynamic energy consumption, i.e., similar to SRAM memories, high reductions are most and foremost possible through a split memory setup. All in all, integrated into the complete flow of simulation, optimization, and code generation, the proposed memory optimization methods show promising results. Due to the sole availability of simulation-based memory figures, future investigation of presented concepts with memory figures from industrial environments for single- and multi-core SoC design is the next logical step.
  • Thumbnail Image
    ItemOpen Access
    High performance 4D light field disparity estimation, super-resolution and compression
    (2022) Tran, Trung Hieu; Simon, Sven (Prof. Dr.-Ing.)