05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    ItemOpen Access
    Optimising the generation performance for multimodal diffusion models using reinforcement learning
    (2024) Gaude, Justus
    This bachelor thesis explores the field of multimodal data synthesis, focusing specifically on the generation of high quality image-text pairs within the UniDiffuser framework. While the UniDiffuser framework has proven its efficiency in generating joint samples along a linear path, where the timesteps of the modalities are uniformly discretized. This study questions whether alternative paths could potentially offer better outcomes in terms of both quality and efficiency. To address this inquiry, hypotheses are formulated, an environment is developed, action space, and state spaces are defined. Through the training of reinforcement learning agents and the use of evaluation metrics, this research attempts to find alternative paths that are more computationally efficient and produce higher quality image-text pairs. Ultimately, this study aims to advancing the state of the art in multimodal data generation.