05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 30
  • Thumbnail Image
    ItemOpen Access
    Rigorous compilation for near-term quantum computers
    (2024) Brandhofer, Sebastian; Polian, Ilia (Prof.)
    Quantum computing promises an exponential speedup for computational problems in material sciences, cryptography and drug design that are infeasible to resolve by traditional classical systems. As quantum computing technology matures, larger and more complex quantum states can be prepared on a quantum computer, enabling the resolution of larger problem instances, e.g. breaking larger cryptographic keys or modelling larger molecules accurately for the exploration of novel drugs. Near-term quantum computers, however, are characterized by large error rates, a relatively low number of qubits and a low connectivity between qubits. These characteristics impose strict requirements on the structure of quantum computations that must be incorporated by compilation methods targeting near-term quantum computers in order to ensure compatibility and yield highly accurate results. Rigorous compilation methods have been explored for addressing these requirements as they exactly explore the solution space and thus yield a quantum computation that is optimal with respect to the incorporated requirements. However, previous rigorous compilation methods demonstrate limited applicability and typically focus on one aspect of the imposed requirements, i.e. reducing the duration or the number of swap gates in a quantum computation. In this work, opportunities for improving near-term quantum computations through compilation are explored first. These compilation opportunities are included in rigorous compilation methods to investigate each aspect of the imposed requirements, i.e. the number of qubits, connectivity of qubits, duration and incurred errors. The developed rigorous compilation methods are then evaluated with respect to their ability to enable quantum computations that are otherwise not accessible with near-term quantum technology. Experimental results demonstrate the ability of the developed rigorous compilation methods to extend the computational reach of near-term quantum computers by generating quantum computations with a reduced requirement on the number and connectivity of qubits as well as reducing the duration and incurred errors of performed quantum computations. Furthermore, the developed rigorous compilation methods extend their applicability to quantum circuit partitioning, qubit reuse and the translation between quantum computations generated for distinct quantum technologies. Specifically, a developed rigorous compilation method exploiting the structure of a quantum computation to reuse qubits at runtime yielded a reduction in the required number of qubits of up to 5x and result error by up to 33%. The developed quantum circuit partitioning method optimally distributes a quantum computation to distinct separate partitions, reducing the required number of qubits by 40% and the cost of partitioning by 41% on average. Furthermore, a rigorous compilation method was developed for quantum computers based on neutral atoms that combines swap gate insertions and topology changes to reduce the impact of limited qubit connectivity on the quantum computation duration by up to 58% and on the result fidelity by up to 29%. Finally, the developed quantum circuit adaptation method enables to translate between distinct quantum technologies while considering heterogeneous computational primitives with distinct characteristics to reduce the idle time of qubits by up to 87% and the result fidelity by up to 40%.
  • Thumbnail Image
    ItemOpen Access
    Efficient fault tolerance for selected scientific computing algorithms on heterogeneous and approximate computer architectures
    (2018) Schöll, Alexander; Wunderlich, Hans-Joachim (Prof. Dr.)
    Scientific computing and simulation technology play an essential role to solve central challenges in science and engineering. The high computational power of heterogeneous computer architectures allows to accelerate applications in these domains, which are often dominated by compute-intensive mathematical tasks. Scientific, economic and political decision processes increasingly rely on such applications and therefore induce a strong demand to compute correct and trustworthy results. However, the continued semiconductor technology scaling increasingly imposes serious threats to the reliability and efficiency of upcoming devices. Different reliability threats can cause crashes or erroneous results without indication. Software-based fault tolerance techniques can protect algorithmic tasks by adding appropriate operations to detect and correct errors at runtime. Major challenges are induced by the runtime overhead of such operations and by rounding errors in floating-point arithmetic that can cause false positives. The end of Dennard scaling induces central challenges to further increase the compute efficiency between semiconductor technology generations. Approximate computing exploits the inherent error resilience of different applications to achieve efficiency gains with respect to, for instance, power, energy, and execution times. However, scientific applications often induce strict accuracy requirements which require careful utilization of approximation techniques. This thesis provides fault tolerance and approximate computing methods that enable the reliable and efficient execution of linear algebra operations and Conjugate Gradient solvers using heterogeneous and approximate computer architectures. The presented fault tolerance techniques detect and correct errors at runtime with low runtime overhead and high error coverage. At the same time, these fault tolerance techniques are exploited to enable the execution of the Conjugate Gradient solvers on approximate hardware by monitoring the underlying error resilience while adjusting the approximation error accordingly. Besides, parameter evaluation and estimation methods are presented that determine the computational efficiency of application executions on approximate hardware. An extensive experimental evaluation shows the efficiency and efficacy of the presented methods with respect to the runtime overhead to detect and correct errors, the error coverage as well as the achieved energy reduction in executing the Conjugate Gradient solvers on approximate hardware.
  • Thumbnail Image
    ItemOpen Access
    Test planning for low-power built-in self test
    (2014) Zoellin, Christian G.; Wunderlich, Hans-Joachim (Prof. Dr. rer. nat. habil.)
    Power consumption has become the most important issue in the design of integrated circuits. The power consumption during manufacturing or in-system test of a circuit can significantly exceed the power consumption during functional operation. The excessive power can lead to false test fails or can result in the permanent degradation or destruction of the device under test. Both effects can significantly impact the cost of manufacturing integrated circuits. This work targets power consumption during Built-In Self-Test (BIST). BIST is a Design-for-Test (DfT) technique that adds additional circuitry to a design such that it can be tested at-speed with very little external stimulus. Test planning is the process of computing configurations of the BIST-based tests that optimize the power consumption within the constraints of test time and fault coverage. In this work, a test planning approach is presented that targets the Self-Test Using Multiple-input signature register and Parallel Shift-register sequence generator (STUMPS) DfT architecture. For this purpose, the STUMPS architecture is extended by clock gating in order to leverage the benefits of test planning. The clock of every chain of scan flip-flops can be independently disabled, reducing the switching activity of the flip-flops and their clock distribution to zero as well as reducing the switching activity of the down-stream logic. Further improvements are obtained by clustering the flip-flops of the circuit appropriately. The test planning problem is mapped to a set covering problem. The constraints for the set covering are extracted from fault simulation and the circuit structure such that any valid cover will test every targeted fault at least once. Divide-and-conquer is employed to reduce the computational complexity of optimization against a power consumption metric. The approach can be combined with any fault model and in this work, stuck-at and transition faults are considered. The approach effectively reduces the test power without increasing the test time or reducing the fault coverage. It has proven effective with academic benchmark circuits, several industrial benchmarks and the Synergistic Processing Element (SPE) of the Cell/B.E.™ Processor (Riley et al., 2005). Hardware experiments have been conducted based on the manufacturing BIST of the Cell/B.E.™ Processor and shown the viability of the approach for industrial, high-volume, high-end designs. In order to improve the fault coverage for delay faults, high-frequency circuits are sometimes tested with complex clock sequences that generate test with three or more at-speed cycles (rather than just two of traditional at-speed testing). In order to allow such complex clock sequences to be supported, the test planning presented here has been extended by a circuit graph based approach for determining equivalent combinational circuits for the sequential logic. In addition, this work proposes a method based on dynamic frequency scaling of the shift clock that utilizes a given power envelope to it full extent. This way, the test time can be reduced significantly, in particular if high test coverage is targeted.
  • Thumbnail Image
    ItemOpen Access
    Efficient modeling and computation methods for robust AMS system design
    (2018) Gil, Leandro; Radetzki, Martin (Prof. Dr.-Ing.)
    This dissertation copes with the challenge regarding the development of model based design tools that better support the mixed analog and digital parts design of embedded systems. It focuses on the conception of efficient modeling and simulation methods that adequately support emerging system level design methodologies. Starting with a deep analysis of the design activities, many weak points of today’s system level design tools were captured. After considering the modeling and simulation of power electronic circuits for designing low energy embedded systems, a novel signal model that efficiently captures the dynamic behavior of analog and digital circuits is proposed and utilized for the development of computation methods that enable the fast and accurate system level simulation of AMS systems. In order to support a stepwise system design refinement which is based on the essential system properties, behavior computation methods for linear and nonlinear analog circuits based on the novel signal model are presented and compared regarding the performance, accuracy and stability with existing numerical and analytical methods for circuit simulation. The novel signal model in combination with the method proposed to efficiently cope with the interaction of analog and digital circuits as well as the new method for digital circuit simulation are the key contributions of this dissertation because they allow the concurrent state and event based simulation of analog and digital circuits. Using a synchronous data flow model of computation for scheduling the execution of the analog and digital model parts, very fast AMS system simulations are carried out. As the best behavior abstraction for analog and digital circuits may be selected without the need of changing component interfaces, the implementation, validation and verification of AMS systems take advantage of the novel mixed signal representation. Changes on the modeling abstraction level do not affect the experiment setup. The second part of this work deals with the robust design of AMS systems and its verification. After defining a mixed sensitivity based robustness evaluation index for AMS control systems, a general robust design method leading to optimal controller tuning is presented. To avoid over-conservative AMS system designs, the proposed robust design optimization method considers parametric uncertainty and nonlinear model characteristics. The system properties in the frequency domain needed to evaluate the system robustness during parameter optimization are obtained from the proposed signal model. Further advantages of the presented signal model for the computation of control system performance evaluation indexes in the time domain are also investigated in combination with range arithmetic. A novel approach for capturing parameter correlations in range arithmetic based circuit behavior computation is proposed as a step towards a holistic modeling method for the robust design of AMS systems. The several modeling and computation methods proposed to improve the support of design methodologies and tools for AMS system are validated and evaluated in the course of this dissertation considering many aspects of the modeling, simulation, design and verification of a low power embedded system implementing Adaptive Voltage and Frequency Scaling (AVFS) for energy saving.
  • Thumbnail Image
    ItemOpen Access
    Stochastic neural networks : components, analysis, limitations
    (2022) Neugebauer, Florian; Polian, Ilia (Prof. Dr.)
    Stochastic computing (SC) promises an area and power-efficient alternative to conventional binary implementations of many important arithmetic functions. SC achieves this by employing a stream-based number format called Stochastic numbers (SNs), which enables bit-sequential computations, in contrast to conventional binary computations that are performed on entire words at once. An SN encodes a value probabilistically with equal weight for every bit in the stream. This encoding results in approximate computations, causing a trade-off between power consumption, area and computation accuracy. The prime example for efficient computation in SC is multiplication, which can be performed with only a single gate. SC is therefore an attractive alternative to conventional binary implementations in applications that contain a large number of basic arithmetic operations and are able to tolerate the approximate nature of SC. The most widely considered class of applications in this regard is neural networks (NNs), with convolutional neural networks (CNNs) as the prime target for SC. In recent years, steady advances have been made in the implementation of SC-based CNNs (SCNNs). At the same time however, a number of challenges have been identified as well: SCNNs need to handle large amounts of data, which has to be converted from conventional binary format into SNs. This conversion is hardware intensive and takes up a significant portion of a stochastic circuit's area, especially if the SNs have to be generated independently of each other. Furthermore, some commonly used functions in CNNs, such as max-pooling, have no exact corresponding SC implementation, which reduces the accuracy of SCNNs. The first part of this work proposes solutions to these challenges by introducing new stochastic components: A new stochastic number generator (SNG) that is able to generate a large number of SNs at the same time and a stochastic maximum circuit that enables an accurate implementation of max-pooling operations in SCNNs. In addition, the first part of this work presents a detailed investigation of the behaviour of an SCNN and its components under timing errors. The error tolerance of SC is often quoted as one of its advantages, stemming from the fact that any single bit of an SN contributes only very little to its value. In contrast, bits in conventional binary formats have different weights and can contribute as much as 50\% of a number's value. SC is therefore a candidate for extreme low-power systems, as it could potentially tolerate timing errors that appear in such environments. While the error tolerance of SC image processing systems has been demonstrated before, a detailed investigation into SCNNs in this regard has been missing so far. It will be shown that SC is not error tolerant in general, but rather that SC components behave differently even if they implement the same function, and that error tolerance of an SC system further depends on the error model. In the second part of this work, a theoretical analysis into the accuracy and limitations of SC systems is presented. An existing framework to analyse and manage the accuracy of combinational stochastic circuits is extended to cover sequential circuits. This framework enables a designer to predict the effect of small design changes on the accuracy of a circuit and determine important parameters such as SN length without extensive simulations. It will further be shown that the functions that are possible to implement in SC are limited. Due to the probabilistic nature of SC, some arithmetic functions suffer from a small bias when implemented as a stochastic circuit, including the max-pooling function in SCNNs.
  • Thumbnail Image
    ItemOpen Access
    Efficient programmable deterministic self-test
    (2010) Hakmi, Abdul-Wahid; Wunderlich, Hans-Joachim (Prof. Dr. habil.)
    In modern times, integrated circuits (ICs) are used in almost all electronic equipment ranging from household appliances to space shuttles and have revolutionized the world of electronics. Continuous reductions in the manufacturing costs as well as the size of this technology have allowed the development of very sophisticated ICs for common use. Post fabrication testing is necessary for each IC in order to ensure the quality and the safety of human life. The improvement in technology as well as economies of scale are continuously reducing fabrication costs. On the other hand, the increasing complexity of circuits is leading to higher test costs. These increasing test costs affect the market price of a chip. A test set is a set of binary patterns that are applied on the circuit inputs to detect the potential faults. Only a small number of bits in a test set are specified to 0 or 1 called care bits while other bits called don't care bits may assume random values. Test sets volume is characterized by the number of patterns as well as the size of each pattern in a test set. The increasing number of gates in nanometer ICs has resulted in an explosive increase in test sets volume. This increase in test sets volume is the major cause for rapidly growing test costs. An IC is tested either by using an automatic test equipment (ATE) or with the help of special hardware added on-chip that performs a self-test. These two approaches as well as their hybrid derivatives offer various trade-offs in test costs, quality, reliability and test time. In ATE testing high test sets volume leads to the requirement of expensive testers with large storage capacity while in self-test it results in significant hardware overhead. A test set is highly compressible due to the presence of a large number of don't care bits. The Test data compression techniques are used to limit test sets volume and hence the involved test cost. These compressed test sets are applicable to both ATE and Self-test methodologies. Compression of a test set depends on its statistical attributes such as the percentage and the distribution of care bits. The available test compression schemes assume that all the test sets have similar statistical attributes which is not always true. These attributes vary considerably among various test sets depending on the circuit structure and the targeted trade-offs. To get optimized reduction in test sets volume, test sets with different statistical attributes have to be addressed separately. In this work we analyze various test sets of industrial circuits and categorize them into three classes based on their statistical attributes. By examining each class differently, three novel compression methods and decompression architectures are proposed. The proposed test compression methods are equally adaptable in ATE testing and self-test. Three low cost programmable self-test schemes offering various trade-offs in testing are developed by applying these methods. The experimental results obtained with the test sets of large industrial circuits show that the proposed compression methods reduce storage requirements by more than half compared to the most efficient available methods. First time in literature the total number of bits in a compressed test set are lesser than the number of care bits in the original test set. The additional advantages of proposed methods include guaranteed encoding, significant reduction in decompression time overhead and programmability of decompression hardware.
  • Thumbnail Image
    ItemOpen Access
    Scatter and beam hardening correction for high-resolution CT in near real-time based on a fast Monte Carlo photon transport model
    (2022) Alsaffar, Ammar; Simon, Sven (Prof. Dr.-Ing.)
    Computed tomography (CT) is a powerful non-destructive testing (NDT) technique. It provides inception about the inner of the scanned object and is widely used for industrial and medical applications. However, this technique suffers from severe quality degradation artifacts. Among these artifacts, the scatter and the beam hardening (BH) causes severe quality degradation of the reconstructed CT images. The scatter results from the change in the direction, or the direction and the energy of the photon penetrating the object, while the beam hardening results from the polychromatic nature of the X-ray source. When photons of different energies penetrate through the object, low-energy photons are more easily absorbed than high-energy photons. This results in the hardening of the X-ray beam which causes the non-linear relation between the propagation path length and the attenuation of the beam. These kinds of artifacts are the major source of the cupping and the streak artifacts that highly degrades the quality of the computed tomography imaging. The presence of the cupping and the streak artifacts reduce the contrast of this image and the contrast-to-noise and cause distortion of the grey values. As a consequence important analysis of the results from the computed tomography technique is affected, e.g., the detectability of voids and cracks is reduced by the reduction of the contrast and affects the dimensional measurement. Monte Carlo (MC) simulation is considered the most accurate approach for scatter estimation. However, the existing MC estimators are computationally expensive, especially for the considered high-resolution flat-panel CT. In this work, a muli-GPU photon forward projection model and an iterative scatter correction algorithm were implemented. The Monte Carlo model has been highly accelerated and extensively verified using several experimental and simulated examples. The implemented model describes the physics within the 1 keV to 1 MeV range using multiple controllable key parameters. Based on this model, scatter computation for a single projection can be completed within a range of a few seconds under well-defined model parameters. Smoothing and interpolation are performed on the estimated scatter to accelerate the scatter calculation without compromising accuracy too much compared to measured near scatter-free projection images. Combining the scatter estimation with the filtered backprojection (FBP), scatter correction is performed effectively in an iterative manner. In order to evaluate the proposed MC model, extensive experiments have been conducted on the simulated data and real-world high-resolution flat-panel CT. Compared to the state-of-the-art MC simulators, the proposed MC model achieved a 15× acceleration on a single GPU in comparison to the GPU implementation of the Penelope simulator (MCGPU) utilizing several acceleration techniques, and a 202× speed-up on a multi-GPU system comparing to the multi-threaded state-of-the-art EGSnrc MC simulator. Furthermore, it is shown that for high-resolution images, scatter correction with sufficient accuracy is accomplished within one to three iterations using a FBP and the proposed fast MC photon transport model. Moreover, a fast and accurate BH correction method that requires no prior knowledge of the materials and corrects first and higher-order BH artifacts has been implemented. In the first step, a wide sweep of the material is performed based on an experimentally measured look-up table to obtain the closest estimate of the material. Then the non-linearity effect of the BH is corrected by adding the difference between the estimated monochromatic and the polychromatic simulated projections of the segmented image. The estimated monochromatic projection is simulated by selecting the energy from the polychromatic spectrum which produces the lowest mean square error (MSE) with the BH-corrupted projection from the scanner. While the polychromatic projection is accurately estimated using the least square estimation (LSE) method by minimizing the difference between the experimental projection and the linear combination of simulated polychromatic projections using different spectra of different filtration. As a result, an accurate non-linearity correction term is derived that leads to an accurate BH correction result. To evaluate the proposed BH correction method, extensive experiments have been conducted on real-world CT data. Compared to the state-of-the-art empirical BH correction method, the experiments show that the proposed method can highly reduce the BH artifacts without prior knowledge of the materials. In summary, the lack of the availability of fast and computationally efficient methods to correct the major artifacts in CT images, i.e., scatter and beam hardening, has motivated this work in which efficient and fast algorithms have been implemented to correct these artifacts. The correction of these artifacts has led to better visualization of the CT images, a higher contrast-to-noise ratio, and improved contrast. Supported by multiple experimental examples, it is shown that the scatter corrected images, using the proposed method, resample the near artifacts-free reference images acquired experimentally within a reasonable time. On the other hand, the application of the proposed BH correction method after the correction of the scatter artifacts results in the complete removal of the rest cupping and streak artifacts that were degrading the scatter-corrected images and improved the contrast-to-noise (CNR) ratio of the scatter-corrected images. Moreover, assessments of the correction quality of the CT images have been performed using the software Volume Graphics VGSTUDIO MAX. Better surface determination can be derived from the artifacts-corrected images. In addition, enhancing the contrast by correcting these artifacts results in an improved detectability of voids and cracks in several concrete examples. This supports the efficiency of the implemented artifacts correction methods in this work.
  • Thumbnail Image
    ItemOpen Access
    Improvement of hardware reliability with aging monitors
    (2017) Liu, Chang; Wunderlich, Hans-Joachim (Prof. Dr.)
  • Thumbnail Image
    ItemOpen Access
    Multi-level analysis of non-functional properties
    (2014) Hatami Mazinani, Nadereh; Wunderlich, Hans-Joachim (Prof. Dr. rer. nat. habil)
    System properties are usually classified into functional (behavioral) and non-functional properties (NFPs). While functional properties refer to system behavior, NFPs are attributes, or constraints of a system. Power dissipation, temperature distribution on the chip, vulnerability to soft and intermittent errors, reliability and robustness are all examples of NFPs. The exponential increase of system complexity and the transistor's smaller feature sizes pose new challenges to functional as well as non-functional properties of the system. Therefore, it becomes more important to understand and model their impact at early design phases. This work targets dynamic, quantifiable NFPs and aims at providing a basis for the accurate analysis of this class of NFPs at early design phases. It proposes an accurate and efficient NFP characterization and analysis method for complex embedded systems. An efficient NFP prediction method helps designers understand how the devices behave over time, identify NFP bottlenecks within circuits and make design trade-offs between performance and different NFPs in the product design stage. It assists manufacturers build their circuits such that no performance degradation due to specific NFPs dominate over the life of an operating device. The developed methodology is based on an efficient multi-level system-wide simulation that considers the target system application. High NFP evaluation speed is achieved using a novel piecewise evaluation technique which splits the simulation time into evaluation windows and efficiently evaluates NFP models once per window by partial linearization. The piecewise evaluation method is a fast, yet accurate replacement for a cycle-accurate NFP evaluation. To consider the mutual impact of different NFPs on each other, all NFP models are integrated into a common evaluation framework. The effect of some positive or negative feedback between different NFPs is dynamically considered during simulation. Evaluations are based on target applications instead of corner case analysis to provide a realistic prediction. The contributions of this work can be summarized as follows: (1) Generality: This work proposes a holistic, scalable NFP prediction methodology for multiple, interdependent NFPs. The NFP simulation and evaluation method is independent of a specific NFP, a particular model or a specific system or core. In addition, the method allows for multiple designs under analysis and multiple NFPs. As soon as the system is available at transaction level, it can be used for NFP estimation. (2) Speed up: The NFP-aware simulation is performed on a multi-level platform while low level simulation is accelerated using parallelism. The complete system simulation is always kept at transaction level. All the NFPs under analysis can be estimated with a single simulation run. In addition, the evaluation speed can be increased by increasing the size of the evaluation window. (3) Accuracy: The accuracy is a function of the accuracy of the selected model and the evaluation methodology. This work provides a method for integrating arbitrary low-level models into the system analysis. The right choice of low-level models may depend on the requirements for accuracy and efficiency. To preserve the low level evaluation accuracy, the required observables for piecewise evaluation are obtained at low level. The evaluation accuracy for the piecewise approach can be adjusted by calibrating the window size. Besides, rather than using statistical or worst-case analysis techniques (which may be too pessimistic in case of embedded systems with well defined applications), the complete system is simulated with the target applications and actual workloads to obtain higher accuracy for specific applications.
  • Thumbnail Image
    ItemOpen Access
    Self-diagnosis in Network-on-Chips
    (2015) Dalirsani, Atefe; Wunderlich, Hans-Joachim (Dr. rer. nat. habil.)
    Network-on-Chips (NoCs) constitute a message-passing infrastructure and can fulfil communication requirements of the today’s System-on-Chips (SoCs), which integrate numerous semiconductor Intellectual Property (IP) blocks into a single die. As the NoC is responsible for data transport among IPs, its reliability is very important regarding the reliability of the entire system. In deep nanoscale technologies, transient and permanent failures of transistors and wires are caused by variety of effects. Such failures may occur in the NoC as well, disrupting its normal operation. An NoC comprises a large number of switches that form a structure spanning across the chip. Inherent redundancy of the NoC provides multiple paths for communication among IPs. Graceful degradation is the property of tolerating a component’s failure in a system at the cost of limited functionality or performance. In NoCs, when a switch in the path is faulty, alternative paths can be used to connect IPs, keeping the SoC functional. To this purpose, a fault detection mechanism is needed to identify the faulty switch and a fault tolerant routing should bypass it. As each NoC switch consists of a number of ports and multiple routing paths, graceful degradation can be considered even in a rather granular way. The fault may destroy some routing paths inside the switch, leaving the rest non-faulty. Thus, instead of disabling the faulty switch completely, its fault-free parts can be used for message passing. In this way, the chance of disconnecting the IP cores is reduced and the probability of having disjoint networks decreases. This study pursues efficient self-test and diagnosis approaches for both manufacturing and in-field testing aiming at graceful degradation of defective NoCs. The approaches here identify the location of defective components in the network rather than providing only a go/no-go test response. Conventionally, structural test approaches like scan-design have been employed for testing the NoC products. Structural testing targets faults of a predefined structural fault model like stuck-at faults. In contrast, functional testing targets certain functionalities of a system for example the instructions of a microprocessor. In NoCs, functional tests target NoC characteristics such as routing functions and undistorted data transport. Functional tests get the highest gain of the regular NoC structure. They reduce the test costs and prevent overtesting. However, unlike structural tests, functional tests do not explicitly target structural faults and the quality of the test approach cannot be measured. We bridge this gap by proposing a self-test approach that combines the advantages of structural and functional test methodologies and hence is suitable for both manufacturing and in-field testing. Here, the software running on the IP cores attached to the NoC is responsible for test. Similar to functional tests, the test patterns here deal only with the functional inputs and outputs of switches. For pattern generation, a model is introduced that brings the information about structural faults to the level of functional outputs of the switch. Thanks to this unique feature of the model, a high structural fault coverage is achieved as revealed by the results. To make NoCs more robust against various defect mechanisms during the lifetime, concurrent error detection is necessary. Toward this, this dissertation contributes an area efficient synthesis technique of NoC switches to detect any error resulting from single combinational and transition fault in the switch and its links during the normal operation. This technique incorporates data encoding and the standard concurrent error detection using multiple parity trees. Results reveal that the proposed approach imposes less area overhead as compared to traditional techniques for concurrent error detection. To enable fine-grained graceful degradation, intact functions of defective switches must be identified. Thanks to the fault tolerant techniques, fault-free parts of switches can be still employed in the NoC. However, reasoning about the fault-free functions with respect to the exact cause of a malfunction is missing in the literature. This dissertation contributes a novel fine-grained switch diagnosis technique that works based on the structural logic diagnosis. After determining the location and the nature of the defect in the faulty switch, all routing paths are checked and the soundness of the intact switch functions is proved. Experimental results show improvements in both performance and reliability of degraded NoCs by incorporating the fine-grained diagnosis of NoC switches.