Universität Stuttgart
Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1
Browse
8 results
Search Results
Item Open Access eSPARQL : design and implementation of a query language for epistemic queries on knowledge graphs(2024) Pan, XinyiIn recent years, large-scale knowledge graphs have emerged, integrating data from various sources. Often, this data includes assertions about other assertions, establishing contexts in which these assertions hold. A recent enhancement to RDF, known as RDF-star, allows for statements about statements and is currently under consideration as a W3C standard. However, RDF-star lacks a defined semantics for such statements and lacks intrinsic mechanisms to operate on them. This thesis describes and implements a novel query language, termed eSPARQL, tailored for epistemic RDF-star metadata and grounded in four-valued logic. Our language builds on SPARQL-star, the query language for RDF-star, by incorporating an expanded FROM clause, called FROM BELIEF, designed to manage multiple, and occasionally conflicting, beliefs. eSPARQL’s capabilities are demonstrated through four example queries, showcasing its ability to (i) retrieve individual beliefs, (ii) aggregate beliefs, (iii) identify conflicts between individuals, and (iv) handle nested beliefs (beliefs about beliefs). The implementation of eSPARQL developed in this thesis is built on top of an existing SPARQL-star query engine. In this implementation, the execution process of a eSPARQL consists of two phases. First, the expression in the FROM BELIEF clause, called belief query, is translated into a SPARQL-star CONSTRUCT query that generates an intermediary graph, containing the beliefs of the subjects described in the belief query. In the second phase, This intermediary graph is then processed with the graph pattern of the eSPARQL by translating it to a graph pattern that can be processed by a standard SPARQ-star engine. In this last phase, the implementation translates eSPARQL operations to SPARQL-star, and checks if the pattern contains nested eSPARQL queries to be processed recursively. We study two research questions: (RQ1) Does the eSPARQL implementation scale? and (RQ2) How the eSPARQL implementation execution times compare with the execution time of manually written SPARQL-star queries? To answer these research questions, use the four example eSPARQL queries that showcase the abilities of eSPARQL and create a synthetic dataset generator that generates graphs of multiple sizes. Additionally, for research question RQ2, we manually generate SPARQL-star queries that are equivalent to the example eSPARQL queries. Regarding research question RQ1, our results show that eSPARQL has an execution time that is proportional with the data size. Regarding research question RQ2, except for one question, the manually written SPARQL-star queries are clearly faster than our implementation. Although the implementation showed to be slower than the manually generated SPARQL-star queries, the eSPARQL queries are shorter and easier to understand. This positive aspect of eSPARQL, can motivate further studies on how to optimize the eSPARQL implementation.Item Open Access Transferability to spectrogram-based anomaly detection : enhancing audio anomaly detection through vision derived methods(2024) Manea, RaduAnomaly detection in industrial audio data is crucial for ensuring smooth manufacturing processes, enabling predictive maintenance and quality control. Despite its importance, audio anomaly detection has received less attention compared to vision-based methods. This thesis explores the applicability and effectiveness of state-of-the-art vision-based anomaly detection methods, specifically designed for image data, in the context of industrial audio data using spectrograms. The research aims to bridge the gap between the two domains by investigating the potential of adapting vision-based approaches to enhance the performance of audio anomaly detection systems in industrial settings. The study focuses on three key questions: (1) the applicability of vision-based anomaly detection methods to industrial audio data, (2) the impact of replacing the image-based feature extractor with a spectrogram-specific feature extractor (AST transformer), and (3) the effect of fine-tuning the AST transformer on industrial spectrograms. The research employs state-of-the-art anomaly detection models, namely Patchcore, FastFlow, EfficientAD, and Reverse Distillation, and evaluates their performance on the DCASE2020 dataset and a real-world industrial dataset from BMW. The findings reveal that vision-based anomaly detection methods can be successfully applied to industrial audio data, with varying degrees of performance depending on the dataset, model architecture, and spectrogram representation used. The study identifies key factors that influence the performance of spectrogram anomaly detection and presents several ways to adapt vision-based approaches for use on spectrograms. These adaptations, such as replacing the image-based feature extractor with a spectrogram-specific feature extractor (AST transformer), have shown promising results in enhancing the performance of audio anomaly detection systems. Furthermore, the successful application of these approaches on the BMW dataset demonstrates their potential in real-world production environments, particularly when recordings are made under controlled conditions with minimal variance.Item Open Access Bayesian symbolic regression in structured latent spaces(2025) Pei, ChenleiSymbolic regression is an interpretable machine learning method that learns mathematical expressions from given data. It naturally combines with Bayesian Inference which lets experts express their knowledge as prior distributions over equations. However, the infinite search space of mathematical expressions renders exhaustive search impractical, and Bayesian Inference remains costly. Therefore, we propose to execute the Bayesian Reasoning in the learned latent space of a trained Variational Autoencoder (VAE) and thereby exploit inherent structures in the search space. While latent spaces have been used to structure search spaces, our approach provides the probability of each mathematical expression rather than selecting the best one. We suggest practical approximations to the posterior distribution in latent space and obtain formula examples by sampling from the posterior using the Gaussian Process Hamiltonian Monte Carlo (GP-HMC) method. We have validated our method using various Koza, Nguyen, and self-generated datasets and compared it against genetic programming and SInDy concerning the Root Mean Square Error (RMSE). Keywords: Symbolic Regression, latent space, Variational Autoencoder, Character Variational Autoencoder, Grammar Variational Autoencoder, Bayesian Reasoning, Gaussian Process, Hamiltonian Monte Carlo, Gaussian Process Hamiltonian Monte CarloItem Open Access Data attribution for diffusion models(2024) Bien, TanjaDiffusion models have demonstrated a remarkable ability to generate photorealistic images. However, it is difficult to explain what causes the generated image. Tracing the output back to the training data and identifying the most influential samples is necessary to debug the model, find biases, or provide fair compensation to creators. While data attribution methods have been extensively studied in the supervised setting, data attribution for generative models such as diffusion models remains a challenge. The aim of this thesis is to provide an overview of existing methods for data attribution and evaluation methods. In the absence of a commonly used benchmark, a framework for evaluating data attribution methods was implemented as part of this thesis. Various experiments and evaluation methods allow a comparison between the different methods to better understand their use cases and limitations. Furthermore they lead to the proposal of new normalization method, called loss-normalization.Item Open Access Driver activity recognition using few-shot learning techniques(2024) Elmaraghy, YoussefIn the realm of computer vision, activity recognition from videos has become increasingly crucial due to its applications in surveillance, healthcare, and autonomous systems. While deep learning has significantly advanced real-time recognition capabilities, the challenge of few-shot learning remains under-explored, especially in scenarios requiring minimal labeled data. This thesis addresses this gap by proposing a novel benchmark for driver activity recognition, which includes both data-rich and data-scarce datasets to evaluate traditional and few-shot learning techniques. Motivated by the complexities and dynamism of the driving environment, this research emphasizes the importance of driver activity recognition for enhancing vehicle safety and improving human-machine interaction. The thesis introduces innovative approaches using 3D convolutional neural networks (CNNs), ResNet 3D models, and vision transformers such as MVitV2 and Swin3D as backbone models for few-shot learning. We implement and test various few-shot models, including Siamese networks, Prototypical networks, and Model-Agnostic Meta-Learning (MAML), highlighting the strengths and limitations of each. Our findings indicate that Siamese and Prototypical networks exhibit high potential in few-shot learning scenarios, with the former excelling in one-shot learning and the latter in multiple-shot learning. However, MAML requires further exploration to achieve optimal results. The computational demands and extensive GPU requirements presented challenges but also underline the need for efficient learning algorithms. In conclusion, this thesis underscores the significance of few-shot learning for driver activity recognition and proposes future research directions to overcome current limitations. By integrating novel architectures and optimizing existing models, this research aims to advance the development of autonomous systems capable of robust activity recognition with minimal data.Item Open Access SwinDiffuser : accelerating diffusion models through parallel processing(2024) Ye, YunDiffusion models have emerged as a powerful generative approach in artificial intelligence, particularly for image, video, and audio synthesis. Despite their success, these models suffer from significant computational demands due to the iterative nature of the denoising process. This thesis introduces the SwinDiffuser, a novel method designed to accelerate diffusion models by leveraging parallel processing. The proposed method divides high-resolution images into smaller patches, allowing for simultaneous processing by multiple diffusers. Key innovations include the integration of global feature extractors and shifting windows to maintain coherence across patches, and the utilisation of a U-Net architecture for noise prediction. Experimental results demonstrate that the SwinDiffuser achieves comparable image quality to standard diffusion models while significantly reducing generation time. This advancement paves the way for practical applications of diffusion models in real-time scenarios and resource-constrained environments.Item Open Access Model-based reinforcement learning under sparse rewards(2023) Akash, RaviReinforcement Learning (RL) has recently seen significant advances over the last decade in simulated and controlled environments. RL has shown impressive results in difficult decision-making problems such as playing video games or controlling robot arms, especially in industrial applications where most methods require many interactions with the system in order to achieve good performance, which can be costly and time-consuming. Model-Based Reinforcement Learning (MBRL) promises to close this gap by leveraging learned environment models and using them for data generation and/or planning and, at the same time trying to be sample efficient. However, Learning with sparse rewards remains a significant challenge in the field of RL. In order to promote efficient learning the sparsity of rewards must be addressed. This thesis work tries to study individual components of MBRL algorithms under sparse reward settings and investigate different design choices made to measure the impact on learning efficiency. Suitable Integral Probability Metrics (IPM) are introduced to understand the model’s reward and observation space distribution during training. These design combinations will be evaluated on continuous control tasks with established benchmarks.Item Open Access Accelerating segment anything models via token merging : a comparative study and a spectrum preservation-based approach(2025) Xie, SiweiThe Segment Anything Model (SAM) has emerged as a significant advancement in image segmentation, demonstrating exceptional generalization across diverse datasets with minimal task-specific tuning. However, its computational demands, inherited from Vision Transformers (ViTs), pose considerable challenges for deployment in resource-constrained environments. This thesis addresses these challenges by integrating token merging strategies, which have proven effective in enhancing the efficiency of ViTs without additional training. Specifically, we conduct a comprehensive analysis of SAM’s architecture and adapt existing token merging techniques to reduce computational overhead while maintaining high segmentation accuracy. We propose an architecture for SAM that incorporates these strategies and evaluate its performance and computational efficiency across various datasets, showing that our approach effectively accelerates SAM’s inference speed while preserving segmentation quality. Furthermore, we propose GradToMe based on PiToMe, an innovative method that leverages gradient approximation and grid-based sampling to combine similar tokens. This approach emphasizes spectrum preservation to retain critical information during the token reduction process, thereby improving the effectiveness of token merging and further saving computational costs. Consequently, our results demonstrate that this approach enhances the feasibility of deploying SAM in real-time applications, making it more suitable for use in resource-limited environments without compromising performance. Code is available at: https://github.com/xxjsw/tome_sam.