Show simple item record

Depositordc.contributorWester, Mirjam
Funderdc.contributor.otherEPSRC - Engineering and Physical Sciences Research Councilen_UK
Data Creatordc.creatorWester, Mirjam
Data Creatordc.creatorWatts, Oliver
Data Creatordc.creatorHenter, Gustav Eje
Date Accessioneddc.date.accessioned2016-03-03T17:08:21Z
Date Availabledc.date.available2016-03-03T17:08:21Z
Citationdc.identifier.citationWester, Mirjam; Watts, Oliver; Henter, Gustav Eje. (2016). Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech", [dataset]. University of Edinburgh, School of Informatics, Centre for Speech Technology Research. https://doi.org/10.7488/ds/1352.en
Persistent Identifierdc.identifier.urihttp://hdl.handle.net/10283/1935
Persistent Identifierdc.identifier.urihttps://doi.org/10.7488/ds/1352
Dataset Description (abstract)dc.description.abstractCurrent speech synthesis methods typically operate on isolated sentences and lack convincing prosody when generating longer segments of speech. Similarly, prevailing TTS evaluation paradigms, such as intelligibility (transcription word error rate) or MOS, only score sentences in isolation, even though overall comprehension arguably is more important for speech-based communication. In an effort to develop more ecologically-relevant evaluation techniques that go beyond isolated sentences, we investigated comprehension of natural and synthetic speech dialogues. Specifically, we tested listener comprehension on long segments of spontaneous and engaging conversational speech (three 10-minute radio interviews of comedians). Interviews were reproduced either as natural speech, synthesised from carefully prepared transcripts, or synthesised using durations from forced-alignment against the natural speech, all in a balanced design. Comprehension was measured using multiple choice questions. A significant difference was measured between the comprehension/retention of natural speech (74% correct responses) and synthetic speech with forced-aligned durations (61% correct responses). However, no significant difference was observed between natural and regular synthetic speech (70% correct responses). Effective evaluation of comprehension remains elusive.en_UK
Dataset Description (TOC)dc.description.tableofcontentsThe dataset is described in the readme.txt file.en_UK
Languagedc.language.isoengen_UK
Publisherdc.publisherUniversity of Edinburgh, School of Informatics, Centre for Speech Technology Researchen_UK
Relation (Is Referenced By)dc.relation.isreferencedby"Evaluating comprehension of natural and synthetic conversational speech" presented at Speech Prosody 2016, Boston, USAen_UK
Rightsdc.rightsCreative Commons Attribution 4.0 International Public Licenseen
Subjectdc.subjectSpeech Synthesis Evaluationen_UK
Subjectdc.subjectComprehensionen_UK
Subjectdc.subjectConversational Speechen_UK
Subjectdc.subjectStatistical Parametric Speech Synthesisen_UK
Subject Classificationdc.subject.classificationMathematical and Computer Sciences::Speech and Natural Language Processingen_UK
Titledc.titleListening test materials for "Evaluating comprehension of natural and synthetic conversational speech"en_UK
Typedc.typedataseten_UK

Download All
zip file MD5 Checksum: 3cf16ccbf26ae5ca792b81475f8fc327

Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record