Wu, Zhizheng. (2015). Listening test materials for "A study of speaker adaptation for DNN-based speech synthesis", [dataset]. University of Edinburgh. The Centre for Speech Technology Research (CSTR). http://dx.doi.org/10.7488/ds/259.
The dataset contains the testing stimuli and listeners' MUSHRA test responses for the Interspeech 2015 paper, "A study of speaker adaptation for DNN-based speech synthesis". In this paper, we conduct an experimental analysis of speaker adaptation for Deep Neural Network (DNN) based speech synthesis at different levels. In particular, we augment a low-dimensional speaker-specific vector with linguistic features as input to represent speaker identity, perform model adaptation to scale the hidden activation weights, and perform a feature space transformation at the output layer to modify generated acoustic features. We systematically analyse the performance of each individual adaptation technique and that of their combinations. Experimental results confirm the adaptability of the DNN, and listening tests demonstrate that the DNN can achieve significantly better adaptation performance than the hidden Markov model (HMM) baseline in terms of naturalness and speaker similarity.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336, VAT Registration Number GB 592 9507 00, and is acknowledged by the UK authorities as a “Recognised body” which has been granted degree awarding powers.