• Hiberlink project data 

      Tobin, Richard; Grover, Claire; Zhou, Ke
      Summary files (in XML format) listing URIs referenced in papers from arXiv, Elsevier, and PMC respectively (approximately 1 million URIs from 3 million papers in total). The focus of the Hiberlink project was to assess the ...
    • ManySStuBs4J Dataset 

      Karampatsis, Rafael-Michael
      The ManySStuBs4J corpus contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are two variations of the dataset. One mined from the 100 Java Maven Projects and one mined from the top ...
    • Visual and Linguistic Treebank 

      Elliott, Desmond; Keller, Frank (2014-09-04)
      The Visual and Linguistic Treebank is a data set of images annotated with human-written descriptions, object boundaries, and Visual Dependency Representations. The images are freely available from the Action Recognition ...
    • WikiCatSum 

      Perez-Beltrachini, Laura; Liu, Yang; Lapata, Mirella
      WikiCatSum is a domain specific Multi-Document Summarisation (MDS) dataset. It assumes the summarisation task of generating Wikipedia lead sections for Wikipedia entities of a certain domain (e.g. Companies) from the set ...