Lorenzo-Trueba, Jaime; Yamagishi, Junichi; Toda, Tomoki; Saito, Daisuke; Villavicencio, Fernando; Kinnunen, Tomi; Ling, Zhenhua. (2018). The Voice Conversion Challenge 2018: database and results, [sound]. The Centre for Speech Technology Research, The University of Edinburgh, UK. http://dx.doi.org/10.7488/ds/2337.
Voice conversion (VC) is a technique to transform a speaker identity included in a source speech waveform into a different one while preserving linguistic information of the source speech waveform.
In 2016, we have launched the Voice Conversion Challenge (VCC) 2016 at Interspeech 2016. The objective of the 2016 challenge was to better understand different VC techniques built on a freely-available common dataset to look at a common goal, and to share views about unsolved problems and challenges faced by the current VC techniques. The VCC 2016 focused on the most basic VC task, that is, the construction of VC models that automatically transform the voice identity of a source speaker into that of a target speaker using a parallel clean training database where source and target speakers read out the same set of utterances in a professional recording studio. 17 research groups had participated in the 2016 challenge. The challenge was successful and it established new standard evaluation methodology and protocols for bench-marking the performance of VC systems.
In 2018, we launched the second edition of VCC, the VCC 2018. In this second edition, we have revised three aspects of the challenge. First, we have reduced the amount of speech data used for the construction of participant's VC systems to half. This is based on feedback from participants in the previous challenge and this is also essential for practical applications. Second, we introduced a more challenging task refereed to a Spoke task in addition to a similar task to the 1st edition, which we call a Hub task. In the Spoke task, participants need to build their VC systems using a non-parallel database in which source and target speakers read out different sets of utterances. We then evaluate both parallel and non-parallel voice conversion systems via the same large-scale crowdsourcing listening test. Third, we also attempted to bridge the gap between the ASV and VC communities. Since new VC systems developed for the VCC 2018 may be strong candidates for enhancing the ASVspoof 2015 database, we also asses spoofing performance of the VC systems based on anti-spoofing scores.
This repository contains the training and evaluation data released to participants, submissions from participants, and the listening test results for the 2018 Voice Conversion Challenge.