The School of Informatics is the largest, longest established and highest quality research group in informatics in the UK.

Research within the School is carried out across a number of institutes. The research programmes organised by the School of Informatics encompass a wide range of domains. Currently these include Artificial Life, Bioinformatics, Computational Thinking, Machine Learning, Music Informatics, Processes, Events & Activity, Software Engineering and System Level Integration.

Sub-communities within this community

Collections in this community

Recent Submissions

  • WikiCatSum 

    Perez-Beltrachini, Laura; Liu, Yang; Lapata, Mirella
    WikiCatSum is a domain specific Multi-Document Summarisation (MDS) dataset. It assumes the summarisation task of generating Wikipedia lead sections for Wikipedia entities of a certain domain (e.g. Companies) from the set ...
  • ASVspoof 2019: The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database 

    Yamagishi, Junichi; Todisco, Massimiliano; Sahidullah, Md; Delgado, Héctor; Wang, Xin; Evans, Nicolas; Kinnunen, Tomi; Lee, Kong Aik; Vestman, Ville; Nautsch, Andreas
    This is a database used for the Third Automatic Speaker Verification Spoofing and Countermeasures Challenge, for short, ASVspoof 2019 (http://www.asvspoof.org) organized by Junichi Yamagishi, Massimiliano Todisco, Md ...
  • A Survey on Developer-Centred Security 

    Tahaei, Mohammad; Vaniea, Kami
    Our research reports a systematic literature review of 49 publications on security studies with software developer participants. These attached files are: - A BibTeX file: includes all 49 references in BibTex format. - ...
  • ManySStuBs4J Dataset 

    Karampatsis, Rafael-Michael
    The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are two variations of the dataset. One mined from the 100 Java Maven Projects and one mined from the top ...
  • Listening-test materials for "Modern speech synthesis for phonetic sciences: a discussion and an evaluation" 

    Malisz, Zofia; Henter, Gustav Eje; Valentini-Botinhao, Cassia; Watts, Oliver; Beskow, Jonas; Gustafson, Joakim
    This data release contains listening-test materials associated with the paper "Modern speech synthesis for phonetic sciences: a discussion and an evaluation", presented at ICPhS 2019 in Melbourne, Australia.
  • Alba speech corpus 

    Valentini-Botinhao, Cassia; Yamagishi, Junichi
    Single speaker read speech corpus of a Scottish accented female native English speaker (Alba). The corpus was recorded in four speaking styles: plain (normal read speech, around 4 hours of recordings), fast (speaking as ...
  • Listening test results of the Voice Conversion Challenge 2018 

    Yamagishi, Junichi; Wang, Xin
    This dataset is associated with a paper and a dataset below: (1) Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhenhua Ling, "The Voice Conversion Challenge ...
  • UltraSuite Repository - sample data 

    Eshky, Aciel; Ribeiro, Manuel Sam; Cleland, Joanne; Renals, Steve; Richmond, Korin; Roxburgh, Zoe; Scobbie, James; Wrench, Alan
    UltraSuite is a repository of ultrasound and acoustic data from child speech therapy sessions. The current release includes three data collections, one from typically developing children -- Ultrax Typically Developing ...
  • Hurricane natural speech corpus - higher quality version 

    Valentini-Botinhao, Cassia; Mayo, Cassie; Cooke, Martin
    Single male native British-English talker recorded producing three speech sets (Harvard sentences, Modified Rhyme Test, news sentences) in quiet and while the talker was listening to speech-shaped noise at 84dB(A). This ...
  • Parallel Audiobook Corpus 

    Ribeiro, Manuel Sam
    The Parallel Audiobook Corpus (version 1.0) is a collection of parallel readings of audiobooks. The corpus consists of approximately 121 hours of speech at 22.05KHz across 4 books and 59 speakers. The data is provided in ...
  • CINIC-10 Is Not ImageNet or CIFAR-10 

    Darlow, Luke N; Crowley, Elliot J; Antoniou, Antreas; Storkey, Amos
    CINIC-10 is an augmented extension of CIFAR-10. It contains the images from CIFAR-10 (60,000 images, 32x32 RGB pixels) and a selection of ImageNet database images (210,000 images downsampled to 32x32). It was compiled as ...
  • Manual and automatic labels for version 1.0 of UXTD, UXSSD, and UPX core data -- version 1.0 

    Eshky, Aciel; Ribeiro, Manuel Sam; Cleland, Joanne; Renals, Steve; Richmond, Korin; Roxburgh, Zoe; Scobbie, James; Wrench, Alan
    UltraSuite is a repository of ultrasound and acoustic data from child speech therapy sessions. The current release includes three data collections, one from typically developing children (UXTD) and two from children with ...
  • The Voice Conversion Challenge 2018: database and results 

    Lorenzo-Trueba, Jaime; Yamagishi, Junichi; Toda, Tomoki; Saito, Daisuke; Villavicencio, Fernando; Kinnunen, Tomi; Ling, Zhenhua
    Voice conversion (VC) is a technique to transform a speaker identity included in a source speech waveform into a different one while preserving linguistic information of the source speech waveform. In 2016, we have ...
  • The 2nd Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017) Database, Version 2 

    Kinnunen, Tomi; Sahidullah, Md; Delgado, Héctor; Todisco, Massimiliano; Evans, Nicholas; Yamagishi, Junichi; Lee, Kong Aik
    This is a database used for the Second Automatic Speaker Verification Spoofing and Countermeasuers Challenge, for short, ASVspoof 2017 (http://www.asvspoof.org) organized by Tomi Kinnunen, Md Sahidullah, Héctor Delgado, ...
  • Device Recorded VCTK (Small subset version) 

    Sarfjoo, Seyyed Saeed; Yamagishi, Junichi
    This dataset is a new variant of the voice cloning toolkit (VCTK) dataset: device-recorded VCTK (DR-VCTK), where the high-quality speech signals recorded in a semi-anechoic chamber using professional audio devices are ...
  • SUPERSEDED - The 2nd Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017) Database, Version 2 

    Kinnunen, Tomi; Sahidullah, Md; Delgado, Héctor; Todisco, Massimiliano; Evans, Nicholas; Yamagishi, Junichi; Lee, Kong Aik
    ## This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/2332 ##
  • SUPERSEDED - The 2nd Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2017) Database, Version 2 

    Kinnunen, Tomi; Sahidullah, Md; Delgado, Héctor; Todisco, Massimiliano; Evans, Nicholas; Yamagishi, Junichi; Lee, Kong Aik
    ## This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/2332 ##
  • Dutch English Lombard Speech Native and Non-Native (DELNN) 

    Marcoux, Katherine; Ernestus, Mirjam; King, Simon
    The DELNN (Dutch English Lombard speech Native and Non-Native) corpus consists of 30 native Dutch speakers reading sentences in a quiet environment and in a noisy environment, to elicit Lombard speech. The Dutch speakers ...
  • Radboud Lombard Corpus_Dutch 

    Shen, Chen; Janse, Esther; King, Simon
    This data set contains 54 (12 for now) native Dutch speakers' Dutch sentence-reading material (48 sentences in natural and 48 sentences in Lombard condition per speaker).
  • Triangulating Context Lemmas 

    McLaughlin, Craig; McKinna, James; Stark, Ian
    Agda formalisation to accompany the paper "Triangulating Context Lemmas" by Craig McLaughlin, James McKinna and Ian Stark. DOI 10.1145/3167081.

View all