Please use this identifier to cite or link to this item:
http://dx.doi.org/10.18419/opus-15150
Authors: | Yang, Yung-Ching |
Title: | Controllable text-to-speech system : speaking style control using hierarchical variational autoencoder |
Issue Date: | 2024 |
metadata.ubs.publikation.typ: | Abschlussarbeit (Master) |
metadata.ubs.publikation.seiten: | 53 |
URI: | http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-151699 http://elib.uni-stuttgart.de/handle/11682/15169 http://dx.doi.org/10.18419/opus-15150 |
Abstract: | This research proposes an utterance embedding model that provides disentangling and scalable control over latent attributes in human speech. Our model is formulated as a hierarchical generative model based on the Variational Autoencoder (VAE) framework, integrated with the FastSpeech2 Text-to-Speech (TTS) system. The work demonstrates that image initiative networks on hierarchical pattern learning can be adapted to model complex distributions in speaking styles and prosody. This work merges advancements in VAE research-particularly those addressing critical statistical challenges such as posterior collapse and unbounded KL divergence-with recent studies focusing on structural enhancements of architectures in VAEs. We introduce a hierarchical structure in latent variable modeling and augment the learning objective with hierarchical information to ensure the latent variables at each level are hierarchically factorized. This approach learns the smooth latent prosody space and deepens our understanding of the relationship between the hierarchical nature of prosody and neural network architecture. Through our customized control mechanism, integrated into various levels of the latent spaces, the model is capable of manipulation of prosodic elements, allowing for both independent and scalable adjustments. By incorporating these techniques, our model is capable of capturing a wide range of prosodic variations, offering a refined level of control and expressiveness in speech synthesis in unsupervised learning contexts. |
Appears in Collections: | 05 Fakultät Informatik, Elektrotechnik und Informationstechnik |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
master_thesis_YungChing_Yang.pdf | 5,07 MB | Adobe PDF | View/Open |
Items in OPUS are protected by copyright, with all rights reserved, unless otherwise indicated.