Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-14134
Autor(en): Bott, Thomas
Titel: Content-aware text-to-speech with prompt-based prosody control
Erscheinungsdatum: 2023
Dokumentart: Abschlussarbeit (Master)
Seiten: 128
URI: http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-141531
http://elib.uni-stuttgart.de/handle/11682/14153
http://dx.doi.org/10.18419/opus-14134
Zusammenfassung: This thesis proposes a text-to-speech system that is conditioned on sentences embeddings extracted from natural language prompts in order to make the prosodic parameters of generated speech controllable in an intuitive and effective way. The system builds on a transformer-based TTS architecture and provides benefits regarding speed, data efficiency, robustness and controllability. The proposed integration scheme essentially concatenates speaker and sentence embeddings by modeling inter-dependencies between them before inducing the joint representation into the model. Furthermore, a training strategy is developed that operates on merged emotional speech and text datasets and varies prompts in each iteration, increasing the generalization capabilities of the model and reducing the risk of over-fitting. Extensive objective and subjective evaluations on utterances generated from sentences of emotional text datasets demonstrate the prompting capabilities of the conditioned TTS system. It achieves high prosodic controllability whereby the emotional content of provided prompts is transferred accurately to generated speech. At the same time the system maintains precise tractability of speaker identities as well as overall high speech quality and intelligibility. Besides a high correlation between prompts and speech prosody, fine-tuning the sentence embedding extractor has been found to be crucial. The proposed TTS system is limited with regard to modeling unseen speakers, intensities and multiple languages.
Enthalten in den Sammlungen:05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
thesis_bott.pdf4,89 MBAdobe PDFÖffnen/Anzeigen


Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.