05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    ItemOpen Access
    Improving the generalisability of fake audio detection
    (2024) Lavrynovska, Viktoria
    The rapid advancements in neural speech synthesis have enabled the generation of deepfake audios that are increasingly indistinguishable from real voice recordings. Automatic fake audio detection is an emerging research area aiming to develop a reliable means of distinguishing between real and synthetic speech. Lacking generalisability of the detection models on unseen data is an issue that is commonly observed. One of the contributions of this work is the investigation of generalisability across different languages. MesoInception-4 trained on the ASVspoof19 anti-spoofing dataset builds the foundation of our detection model. The use of Mel-Frequency Cepstral Coefficients has been found to be superior to Whisper features for the cross-lingual task. Notably, our model exhibits robust performance on all evaluated languages, despite being trained exclusively on English data, and shows no evidence of language dependency or correlation with the speech quality of the language subsets. However, the findings reveal that the detection model fails to generalise well on the In-the-Wild dataset. We identify that reducing the length of audio clips and fine-tuning specific inception modules can alleviate these issues to some degree. Conversely, augmenting training data with various real-world noises from the MUSAN corpus did not significantly enhance generalisability, and the inclusion of pink noise and silence led to performance degradation on In-the-Wild data. In summary, the findings highlight the complexity of fake audio detection and underscore the importance of further research to elucidate the factors influencing performance and generalisability of detection systems.