Universität Stuttgart
Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1
Browse
9 results
Search Results
Item Open Access Driver alertness monitoring using steering, lane keeping and eye tracking data under real driving conditions(2020) Friedrichs, Fabian; Yang, Bin (Prof. Dr.-Ing.)Since humans operate trains, vehicles, aircrafts and industrial machinery, fatigue has always been one of the major causes of accidents. Experts assert that sleepiness is among the major causes of severe road accidents. In-vehicle fatigue detection has been a research topic since the early 80’s. Most approaches are based on driving simulator studies, but do not properly work under real driving conditions. The Mercedes-Benz ATTENTION ASSIST is the first highly sophisticated series equipment driver assistance system on the market that detects early signs of fatigue. Seven years of research and development with an unparalleled demand of resources were necessary for its series introduction in 2009 for passenger cars and 2012 for busses. The system analyzes the driving behavior and issues a warning to sleepy drivers. Essentially, this system extracts a single measure (so-called feature), the steering event rate by detecting a characteristic pattern in the steering wheel angle signal. This pattern is principally described by a steering pause followed by a sudden correction. Various challenges had to be tackled for the series-production readiness, such as handling individual driving styles and external influences from the road, traffic and weather. Fuzzy logic, driving style detection, road condition detection, change of driver detection, fixed-point parameter optimization and sensor surveillance were some of the side results from this thesis that were essential for the system’s maturity. Simply issuing warnings to sleepy drivers is faintly "experiencable" nor transparent. Thus, the next version 2.0 of the system was the introduction of the more vivid ATTENTION LEVEL, which is a permanently available bargraph monitoring the current driving performance. The algorithm is another result of this thesis and was introduced 2013 in the new S-Class. Fatigue is very difficult to grasp since a ground truth reference does not exist. Thus, the presented findings about camera-based driver monitoring are included as fatigue reference for algorithm training. Concurrently, the presented results build the basis for eye-monitoring cameras of the future generation of such systems. The driver monitoring camera will also play a key role in "automated driving" since it is necessary to know if the driver looks to the road while the vehicle is driving and if he is alert enough to take back control over the vehicle in complex situations. All these improvements represent major steps towards the paradigm of crash free driving. In order to develop and improve the ATTENTION ASSIST, the central goal of the present work was the development of pattern detection and classification algorithms to detect fatigue from driving sensors. One major approach to achieve a sufficiently high detection rate while maintaining the false alarm rate at a minimum was the incorporation of further patterns with sleepiness-associative ability. Features reported in literature were assessed as well as improved extraction techniques. Various new features were proposed for their applicability under real-road conditions. The mentioned steering pattern detection is the most important feature and was further optimized. Essential series sensor signals, available in most today’s vehicles were considered, such as steering wheel angle, lateral and longitudinal acceleration, yaw rate, wheel rotation rate, acceleration pedal, wheel suspension level, and vehicle operation. Another focus was on the lateral control using camera-based lane data. Under real driving conditions, the effects of sleepiness on the driving performance are very small and severely obscured by external influences such as road condition, curvature, cross-wind, vehicle speed, traffic, steering parameters etc. Furthermore, drivers also have very different individual driving styles. Short-term distraction from vehicle operation also has a big impact on the driving behavior. Proposals are given on how to incorporate such factors. Since lane features require an optional tracking camera, a proposal is made on how to estimate some lane deviation features from only inertial sensory by means of an extended Kalman filter. Every feature is related to a number of parameters and implementation details. A highly accelerated method for parameter optimization of the large amount of data is presented and applied to the most promising features. The alpha-spindle rate from the Electroencephalogram (EEG) and Electrooculogram (EOG) were assessed for their performance under real driving conditions. In contrast to the majority of results in literature, EEG was not observed to contribute any useful information to the fatigue reference (except for two drives with microsleeps). Generally, the subjective self-assessments according to the Karolinska Sleepiness Scale and a three level warning acceptance question were consequently used. Various correlation measures and statistical test were used to assess the correlation of features with the reference. This thesis is based on a database with over 27,000 drives that accumulate to over 1.5 mio km of real-road drives. In addition, various supervised real-road driving studies were conducted that involve advanced fatigue levels. The fusion of features is performed by different classifiers like Artificial Neural Networks (ANN) and Support Vector Machines (SVM). Fair classification results are achieved with ANN and SVM using cross-validation. A selection of the most potential and independent features is given based on automatic SFFS feature selection. Classical machine learning methods are used in order to yield maximal system transparency and since the algorithms are targeted to run in present control units. The potential of using end-to-end deep learning algorithms is discussed. Whereas its application to CAN-signals is problematic, there is a high potential for driver-camera based approaches. Finally, features were implemented in a real-time demonstrator using an own CAN-interface framework. While various findings are already rolled out in ATTENTION ASSIST 1.0, 2.0 and ATTENTION LEVEL, it was shown that further improvements are possible by incorporating a selection of steering- and lane-based features and sophisticated classifiers. The problem can only be solved on a system level considering all topics discussed in this thesis. After decades of research, it must be recognized that the limitations of indirect methods have been reached. Especially in view of emerging automated driving, direct methods like eye-tracking must be considered and have shown the greatest potential.Item Open Access Improving automotive radar spectra object classification using deep learning and multi-class uncertainty calibration(2022) Patel, Kanil; Yang, Bin (Prof. Dr.-Ing.)Being a prerequisite for successful automated driving, ameliorating the perception capabilities of vehicles is of paramount importance for reliable and robust scene understanding. Required for decision-making in autonomous vehicles, scene understanding becomes particularly challenging in adverse weather and lighting conditions; situations also often posing challenges for human drivers. Automotive radars can greatly assist sensors currently deployed on vehicles for robust measurements, especially in challenging conditions where other sensors often fail to operate reliably. However, classification using radar sensors is often limited to a few classes (e.g. cars, humans, and stop signs), controlled laboratory settings, and/or simulations. Already offering reliable distance, azimuth and velocity estimates of the objects in the scene, improving radar-based classification greatly expands the usage of radar sensors for tackling multiple driving-related tasks which are often performed by other less robust sensors. This thesis investigates how automated driving perception can be improved using multi-class radar classification by using deep learning algorithms for exploiting object class characteristics captured in the radar spectra. Despite the highly-accurate predictions of deep learning models, such classifiers exhibit severe over-confidence which can lead decision-making systems to false conclusions, with possibly catastrophic consequences - often a matter of life and death for automated driving. Consequently, high-quality, robust, and interpretable uncertainty estimates are indispensable characteristics of any unassailable automated driving system. With the goal of uncertainty estimates for real-time predictive systems, this thesis also aims at tackling the prominent over-confidence of deep learning classification models, which persists for all data modalities. Being an important measure for the quality of uncertainty estimates, this work focuses on the accurate estimation of the calibration of trained classifiers, as well as present novel techniques for improving their calibration. The presented solutions offer high-quality real-time confidence estimates for classification models of all data modalities (e.g. non-radar applications), as well as classifiers which are already trained and used in practise and new training strategies for learning new classifiers. Furthermore, the presented uncertainty calibration algorithms could also be extended to tasks other than classification, for example, regression and segmentation. On a challenging new realistic automated driving radar dataset, the solutions proposed in this thesis show that radar classifiers are able to generalize to novel driving environments, driving patterns, and object instances in realistic static driving scenes. To further replicate realistic encounters of autonomous vehicles, we study the behaviour of the classifiers to spectra corruptions and outlier detection of unknown objects, showing significant performance improvements in safely handling these prevalent encounters through accurate uncertainty estimates. With the proposed generalization and requisite accurate uncertainty estimation techniques, the radar classifiers in this study greatly improve radar-based perception for scene understanding and lay a solid foundation for current sensor fusion techniques to leverage radar measurements for object classification.Item Open Access Least-squares based layerwise pruning of Deep Neural Networks(2024) Mauch, Lukas; Yang, Bin (Prof. Dr.-Ing.)Tiefe Neuronale Netze (DNNs) sind derzeit die leistungsstärksten Modelle im Bereich des maschinellen Lernens und lösen erfolgreich viele Aufgaben, wie zum Beispiel Bild- und Spracherkennung, semantische Segmentierung oder Datengenerierung. Aufgrund der inhärent hohen Rechenkomplexität von DNNs wurden schon früh Pruningverfahren angewandt um die Rechenkomplexität von DNNs zu reduzieren und um die Inferenz zu beschleunigen. Pruningverfahren entfernen (prunen) Parameter aus einem trainierten DNN, ohne ihre Leistung dadurch signifikant zu beeinträchtigen. Die dadurch erhaltenen Modelle können auch auf schwachen Rechenplattformen mit hoher Geschwindigkeit ausgewertet werden. In den letzten Jahren wurden Pruningverfahren nicht nur nach dem Training, sondern auch als Bestandteil von modernen Trainingsalgorithmen für DNNs eingesetzt. So wenden zum Beispiel viele speichereffiziente Trainingsalgorithmen oder Architektursuchverfahren pruning schon während des Trainings an, um unwichtige Parameter aus dem DNN zu entfernen. Problematisch ist, dass viele moderne Pruningverfahren auf regularisierten, überwachten Trainingverfahren beruhen und daher selbst sehr rechenaufwändig sind. Solche Pruningverfahren können nicht ohne Weiteres in andere Trainingsalgorithmen eingebettet werden. Es besteht daher ein wachsendes Interesse an Pruningmethoden, die sowohl schnell als auch genau sind. In dieser Arbeit untersuchen wir das layerbasierte Least-Squares (LS) Pruning – ein Framework für das strukturierte Pruning von DNNs. Wir zeigen, dass LS-Pruning eine schnelleund dennoch genaue Methode für die DNN-reduktion ist, die für Zero-Shot oder für die unüberwachte Netzwerkreduktion verwendet werden kann. In experimenten vergleichen wir LS-Pruning mit anderen schnellen Reduktionsmethoden, wie zum Beispiel dem magnitudenbasierten Pruning und der LS-Faktorisierung. Darüber hinaus vergleichen wir LS-Pruning mit überwachten Pruningverfahren.Item Open Access The polar transmitter : analysis and algorithms(2015) Ibrahim, Mohamed; Yang, Bin (Prof. Dr.-Ing.)The polar transmitter architecture is a promising candidate for future mobile communications. It can outperform traditional IQ transmitters in terms of power effciency and space consumption. The massive increase in bandwidth demand makes the design of the polar transmitter a challenging task. Since the polar transmitter incorporates digital signals at RF sampling rates, then signal processing principles and algorithms could be implemented to relax the physical constraints of designing the polar transmitter components. In this thesis, the polar transmitter is analyzed from the architectural point of view. Properties of the polar signals which result from the Cartesian-to-polar conversion will be investigated. Moreover, mathematical models of the polar transmitter components, as well as their distortions, will be introduced. The strict requirements imposed on the polar transmitter components will be relaxed by introducing several novel digital signal processing algorithms. The suitability of the presented algorithms will be evaluated by simulating LTE up-link signals using the polar transmitter.Item Open Access Object-level image segmentation with prior information(2019) Wang, Chunlai; Yang, Bin (Prof. Dr.-Ing.)Item Open Access Machine learning for end-use electrical energy monitoring(2021) Barsim, Karim Said; Yang, Bin (Prof. Dr.-Ing.)Promoting end-users awareness of their usage and consumption of energy is one of the main measures towards achieving energy efficiency in buildings, which is one of the main targets in climate-aware energy transition programs. End-use energy disaggregation and monitoring is a practical and efficient approach towards achieving the targeted awareness of energy users by providing them with real-time fine-grained feedback about their own usage of energy. In this work, we address the case of electrical energy and the problem of end-use load monitoring and disaggregation in a variety of machine learning paradigms. This work starts from unsupervised energy disaggregation based on simple constraints and assumptions without the need for labeled training data. We then study and propose semi-supervised disaggregation approaches that learn from labeled observations, but are also capable of compensating for the scarcity of labeled data by leveraging unlabeled measurements. Finally, we propose a generic neural architecture for data-driven disaggregation upon availability of an abundance of training data. Results from this work not only assert the feasibility of end-use energy disaggregation, but also propose efficient models that adapt to the availability of labeled data, and are capable of monitoring different categories of end-use loads.Item Open Access Generative models and domain adaptation for autonomous driving(2024) Eskandar, George; Yang, Bin (Prof. Dr.-Ing.)Artificial Intelligence (AI) and Deep Learning (DL) have recently affected human society in profound ways, sparking conversations about their technological, social and ethical impacts on our daily lives. The development of intelligent agents capable of perceiving, reasoning, and interacting with the 3D spaces is crucial, especially for Autonomous Driving (AD), which promises to revolutionize mobility, reduce accidents, and conserve time and energy. However, achieving full AD is hindered by the challenge of generalizing to new conditions. This is because autonomous vehicles rely on DL models which are limited by the scope of their training data. The sheer variety of potential real-world driving situations, particularly dangerous ones, cannot be reproduced for training purposes. When encountering these unrepresented situations, the vehicles face a domain gap, where they must operate in conditions different from what they were trained on. This mismatch can undermine their safety and dependability, restricting their practical use and leading to significant financial setbacks for car manufacturers. Research efforts against domain gaps have been channeled into two main directions: (1) employing generative AI models to produce synthetic data, thus augmenting the training datasets, and (2) fine-tuning pre-trained DL models for data in new domains without the need for manual labeling. The former strategy is known as generative models, while the latter is referred to as domain adaptation. However, current approaches suffer from multiple drawbacks when applied to AD in particular. For instance, generative models struggle to achieve photorealism, controllability and label-efficiency at the same time, when applied to complex scenes. On the other hand, domain adaptation is still understudied for some sensor modalities like LiDAR and for sensor fusion models (camera and LiDAR) which are widely used in AD, limiting their potential. This dissertation is part of the KI Delta Learning project, funded by the Bundesministerium für Wirtschaft und Energie (BMWi), to address the critical challenge of domain gaps in AD. Towards this goal, we developed novel approaches in three key AD areas: (1) Generating photorealistic and editable urban scenes, (2) enhancing the resolution of LiDAR pointclouds and (3) adapting 2D and 3D object detectors to new domains. In the first two applications, we developed novel generative models that provide additional training data (camera and LiDAR). In the third area, we established new architectures and training strategies to build models that are more robust against domain shifts. Across all areas, the considered domain gaps encompass weather, sensor and location changes. In our first application, we devised a series of models capable of producing high-quality, photorealistic images from semantic maps, tailored to different annotation cost levels. For the lowest cost, we introduced two fully unsupervised models: Unsupervised Semantic Image Synthesis (USIS) and Synthetic-to-Real SIS. USIS operates on unpaired images and semantic maps, ideally where both share comparable spatial and semantic characteristics derived from real-world data. The Synthetic-to-Real SIS model mitigates the need for such similarity by accommodating labels generated through computer graphics, which may differ statistically from real-world imagery. We then developed a semi-supervised model, Semi-Paired SIS, which learns from a vast collection of unpaired images and labels, plus a smaller subset of paired data. Semi-Paired SIS nearly matches the performance of fully supervised approaches with significantly less paired data. Lastly, we introduced a supervised model, Urban-StyleGAN, capable of generating images and labels from noise vectors and modifying the image through vector manipulation. In the second application, we developed a novel model to upsample low-resolution LiDAR pointclouds into high-resolution, balancing cost-effectiveness and performance. In the third application, we pioneered a model to adapt a multi-sensor 2D object detector to harsh weather conditions. Finally, a large empirical study on the robustness of 3D object detectors was conducted, yielding several important novel findings in the robustness and adaptation to unseen conditions. Each developed model was rigorously tested across multiple public benchmarks, consistently achieving state-of-the-art results. In conclusion, this dissertation presents significant theoretical and practical advancements in generative models and domain adaptation for AD. The important benefits of this work encompass enhanced photorealism, improved controllability, greater label efficiency, and increased robustness against domain shifts, all of which contribute to the safety and reliability of autonomous systems. We hope our contributions can benefit the DL and AD communities and find applications in other related fields (medical, satellite image processing, radar signal processing...), fostering innovation and practical advancements across these fields.Item Open Access Unsupervised model adaptation for vision-centric deep neural networks(2025) Marsden, Robert A.; Yang, Bin (Prof. Dr.-Ing.)Despite significant advancements in deep learning and computer vision, the robustness of human perception is still unmatched. For example, while humans can reliably interpret a scene in unfamiliar locations or environments, Deep Neural Networks (DNNs) usually struggle in such scenarios, exhibiting substantial performance degradation. This decline in performance can be attributed to differences between the training and test distribution, as well as the current limitations of DNNs to effectively generalize their learned knowledge. The lack of generalization has several consequences for the practical application of DNNs. Each time the test conditions change, new data must be collected, manually annotated, and used to retrain the model. While gathering unlabeled data and retraining the model is often less problematic, manual labelling is highly time-consuming and should thus be minimized. For safety-critical applications like automated driving, the limited generalization requires to cover virtually every situation with labeled data to ensure reliability, which verges on impossible. A promising approach to minimize manual labeling efforts or increase a network's robustness during inference is Unsupervised Model Adaptation, which is the focus of this thesis. For example, by transferring knowledge via Unsupervised Domain Adaptation (UDA) from a labeled source dataset to an unlabeled target dataset collected under new test conditions, manual data annotation can be avoided. Alternatively, robustness can be improved by adapting the model directly during inference using online Test-time Adaptation (TTA). However, both research fields still face challenges related to performance, stability, and other issues that must be resolved before they can reliably overcome the problems caused by insufficient generalization. The foundations for the subsequently presented contributions to UDA and TTA are laid by first defining the settings and providing a comprehensive review of existing methods for UDA, online TTA, and related subfields (like continual UDA), which are also important for this thesis. Then, a novel framework to perform UDA for semantic segmentation is introduced, which exploits contrastive learning to align two domains at the category level. The label information required to create the pairs for contrastive learning is extracted via online-refined pseudo-labels that further allow an effective adaptation via self-training. To prevent the network's output head from developing harmful biases toward either the source or target domain, an additional loss-weighting scheme is employed that promotes globally diverse predictions. The effectiveness of the framework is validated using two widely adopted benchmarks for UDA. Although fusing the information provided by two complementary sensors, like RGB and LiDAR, can increase a network's robustness, performance degradation in adverse weather conditions can still occur. To again overcome manual data annotation, the first framework for conducting UDA in multimodal 2D object detection using RGB and LiDAR data is presented. The approach uses adversarial learning and pre-text tasks to align the domains in the feature space. Focusing on perception in autonomous driving, the framework is shown to be effective not only for adapting a model to a single adverse weather condition but also for adapting to multiple adverse weather scenarios simultaneously. However, since the data from multiple target domains may become available sequentially, without access to previous target data, continual UDA is subsequently studied. By conditioning an AdaIN-based style transfer model on each class and exploiting style replay, the framework outperforms other baseline methods on a purely synthetic and a more challenging real-world domain sequence. Addressing the aspect of enhancing a model's robustness, a novel framework that improves the efficacy of self-training during online model adaptation is introduced. The basic idea involves converting the currently encountered domain shift into a more gradual one, where self-training has been shown to be particularly effective. This is achieved by introducing an artificial intermediate domain at each time step t, created either through mixup or lightweight style transfer. The approach proves to be highly effective for urban scene segmentation in non-stationary environments but also performs well for classification tasks. Since methods for online model adaptation must remain stable across diverse scenarios, a comprehensive picture of many practically relevant test settings is introduced. By thoroughly analyzing and empirically validating the challenges in these scenarios, a highly effective self-training-based adaptation framework is derived, that includes diversity and certainty weighting, continuous weight ensembling, and prior correction. Extensive experiments across a wide range of datasets, settings, and models not only validate the framework's superiority but also reveal the limitations of existing methods for online TTA. The thesis concludes with a summary of the key contributions, a discussion of the various techniques to perform Unsupervised Model Adaptation, and an outlook on remaining challenges for both UDA and online TTA.Item Open Access Improving convolutional neural networks : advanced regularization methods and architectural innovations(2025) Cakaj, Rinor; Yang, Bin (Prof. Dr.-Ing.)In recent years, Convolutional Neural Networks (CNNs) have demonstrated significant advancements in a variety of computer vision applications, such as image recognition, object detection and image segmentation. Despite their success, such networks suffer from overfitting due to the limited size of training data and the high capacity of the model. Overfitting describes the phenomenon where a CNN achieves perfect performance on the training set but poor generalization to new, unseen data. To address overfitting, various regularization methods have been developed. However, there remains a need for strategies that can further improve generalization and provide complementary benefits when combined with existing techniques. This work introduces three novel regularization methods, each uniquely designed to build upon existing approaches to enhance the generalization of CNNs in different ways: Weight Compander (WC), Spectral Batch Normalization (SBN), Spectral Wavelet Dropout (SWD). While regularization methods improve the generalization of CNNs, they do not increase the capacity of the networks to process and represent more complex features. This is because CNNs are constrained by their fixed architectures. Therefore, advancements in network architecture are needed to offer improvements that go beyond the adjustment in training behavior. Specifically, CNNs lack a mechanism for dynamic feature retention similar to the memory of the human brain. To address this, we propose the Squeeze-and-Remember (SR) Block, a new architectural unit for CNNs that allows them to store high-level features during training and recall those features during inference. Despite their remarkable performance, CNNs often require substantial computational power and extensive memory usage. This poses considerable challenges when deploying parameter-heavy models on devices with limited computational resources. To mitigate this, we finally introduce a technique called Mixture-of-Depths (MoD) for CNNs which enhances the computational efficiency of CNNs by selectively processing channels based on their relevance to the current prediction.