Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-13842
Langanzeige der Metadaten
DC ElementWertSprache
dc.contributor.authorZhou, Zhenliang-
dc.date.accessioned2023-12-19T15:08:25Z-
dc.date.available2023-12-19T15:08:25Z-
dc.date.issued2023de
dc.identifier.other1876993537-
dc.identifier.urihttp://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-138616de
dc.identifier.urihttp://elib.uni-stuttgart.de/handle/11682/13861-
dc.identifier.urihttp://dx.doi.org/10.18419/opus-13842-
dc.description.abstractTransfer learning is widely used as an important machine learning method in natural language processing. To complete a specific task, the developer must use the relevant datasets to complete the training of the specific task model, which will consume a lot of data and computing resources. In this case, transfer learning is proposed. To finish the specific task, the developer can first train a multi-task model with giant datasets, and then only need a small amount of data to train the model on a specific task to achieve migration. In natural language processing, four major transfer learning methods have been proposed: adapter, BitFit, diff pruning, and full finetuning, which use less finetuning data and less training time to achieve comparable results with a single task model. We expect to apply these four transfer learning methods in the text-to-speech domain. We enable pre-training on Fastspeech2 using the multi-speaker dataset to learn the speech information of these speakers. Then a single speaker training dataset is used to finetune the pre-training model to imitate the speaker's speech characteristics. After generating the speech audios by four transfer models, we compare the generated audios with the original speech of the speaker and score these speech signals through non-subjective and subjective evaluation to obtain the methods' performance. We find that BitFit has the best performance in the transfer learning experiment trained with low resources datasets(vctk), while full finetuning encountered the problem of overfitting, which heavily influence the audio duration information. Besides, the audios generated by the diff pruning model are all noise, which represents diff pruning is completely unsuitable for the migration of low resources datasets. In the comparative experiment, we use LJspeech(high resources) dataset for finetuning. The adapter and full finetuning models have the best speech restoration. Although the voice quality of BitFit and diff pruning is inferior to the adapter and full finetuning, the audio quality is not significantly reduced.deen
dc.language.isoende
dc.rightsinfo:eu-repo/semantics/openAccessde
dc.subject.ddc004de
dc.titleEvaluation of transfer learning methods in text-to-speechen
dc.typemasterThesisde
ubs.fakultaetInformatik, Elektrotechnik und Informationstechnikde
ubs.institutInstitut für Maschinelle Sprachverarbeitungde
ubs.publikation.seiten79de
ubs.publikation.typAbschlussarbeit (Master)de
Enthalten in den Sammlungen:05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Dateien zu dieser Ressource:
Datei Beschreibung GrößeFormat 
master_thesis_Zhenliang Zhou.pdf11,57 MBAdobe PDFÖffnen/Anzeigen


Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.