Driver activity recognition using few-shot learning techniques
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In the realm of computer vision, activity recognition from videos has become increasingly crucial due to its applications in surveillance, healthcare, and autonomous systems. While deep learning has significantly advanced real-time recognition capabilities, the challenge of few-shot learning remains under-explored, especially in scenarios requiring minimal labeled data. This thesis addresses this gap by proposing a novel benchmark for driver activity recognition, which includes both data-rich and data-scarce datasets to evaluate traditional and few-shot learning techniques. Motivated by the complexities and dynamism of the driving environment, this research emphasizes the importance of driver activity recognition for enhancing vehicle safety and improving human-machine interaction. The thesis introduces innovative approaches using 3D convolutional neural networks (CNNs), ResNet 3D models, and vision transformers such as MVitV2 and Swin3D as backbone models for few-shot learning. We implement and test various few-shot models, including Siamese networks, Prototypical networks, and Model-Agnostic Meta-Learning (MAML), highlighting the strengths and limitations of each. Our findings indicate that Siamese and Prototypical networks exhibit high potential in few-shot learning scenarios, with the former excelling in one-shot learning and the latter in multiple-shot learning. However, MAML requires further exploration to achieve optimal results. The computational demands and extensive GPU requirements presented challenges but also underline the need for efficient learning algorithms. In conclusion, this thesis underscores the significance of few-shot learning for driver activity recognition and proposes future research directions to overcome current limitations. By integrating novel architectures and optimizing existing models, this research aims to advance the development of autonomous systems capable of robust activity recognition with minimal data.