Generative models and domain adaptation for autonomous driving
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Artificial Intelligence (AI) and Deep Learning (DL) have recently affected human society in profound ways, sparking conversations about their technological, social and ethical impacts on our daily lives. The development of intelligent agents capable of perceiving, reasoning, and interacting with the 3D spaces is crucial, especially for Autonomous Driving (AD), which promises to revolutionize mobility, reduce accidents, and conserve time and energy. However, achieving full AD is hindered by the challenge of generalizing to new conditions. This is because autonomous vehicles rely on DL models which are limited by the scope of their training data. The sheer variety of potential real-world driving situations, particularly dangerous ones, cannot be reproduced for training purposes. When encountering these unrepresented situations, the vehicles face a domain gap, where they must operate in conditions different from what they were trained on. This mismatch can undermine their safety and dependability, restricting their practical use and leading to significant financial setbacks for car manufacturers.
Research efforts against domain gaps have been channeled into two main directions: (1) employing generative AI models to produce synthetic data, thus augmenting the training datasets, and (2) fine-tuning pre-trained DL models for data in new domains without the need for manual labeling. The former strategy is known as generative models, while the latter is referred to as domain adaptation. However, current approaches suffer from multiple drawbacks when applied to AD in particular. For instance, generative models struggle to achieve photorealism, controllability and label-efficiency at the same time, when applied to complex scenes. On the other hand, domain adaptation is still understudied for some sensor modalities like LiDAR and for sensor fusion models (camera and LiDAR) which are widely used in AD, limiting their potential.
This dissertation is part of the KI Delta Learning project, funded by the Bundesministerium für Wirtschaft und Energie (BMWi), to address the critical challenge of domain gaps in AD. Towards this goal, we developed novel approaches in three key AD areas: (1) Generating photorealistic and editable urban scenes, (2) enhancing the resolution of LiDAR pointclouds and (3) adapting 2D and 3D object detectors to new domains. In the first two applications, we developed novel generative models that provide additional training data (camera and LiDAR). In the third area, we established new architectures and training strategies to build models that are more robust against domain shifts. Across all areas, the considered domain gaps encompass weather, sensor and location changes.
In our first application, we devised a series of models capable of producing high-quality, photorealistic images from semantic maps, tailored to different annotation cost levels. For the lowest cost, we introduced two fully unsupervised models: Unsupervised Semantic Image Synthesis (USIS) and Synthetic-to-Real SIS. USIS operates on unpaired images and semantic maps, ideally where both share comparable spatial and semantic characteristics derived from real-world data. The Synthetic-to-Real SIS model mitigates the need for such similarity by accommodating labels generated through computer graphics, which may differ statistically from real-world imagery. We then developed a semi-supervised model, Semi-Paired SIS, which learns from a vast collection of unpaired images and labels, plus a smaller subset of paired data. Semi-Paired SIS nearly matches the performance of fully supervised approaches with significantly less paired data. Lastly, we introduced a supervised model, Urban-StyleGAN, capable of generating images and labels from noise vectors and modifying the image through vector manipulation.
In the second application, we developed a novel model to upsample low-resolution LiDAR pointclouds into high-resolution, balancing cost-effectiveness and performance. In the third application, we pioneered a model to adapt a multi-sensor 2D object detector to harsh weather conditions. Finally, a large empirical study on the robustness of 3D object detectors was conducted, yielding several important novel findings in the robustness and adaptation to unseen conditions. Each developed model was rigorously tested across multiple public benchmarks, consistently achieving state-of-the-art results.
In conclusion, this dissertation presents significant theoretical and practical advancements in generative models and domain adaptation for AD. The important benefits of this work encompass enhanced photorealism, improved controllability, greater label efficiency, and increased robustness against domain shifts, all of which contribute to the safety and reliability of autonomous systems. We hope our contributions can benefit the DL and AD communities and find applications in other related fields (medical, satellite image processing, radar signal processing...), fostering innovation and practical advancements across these fields.