Allamanis, Miltiadis; Sutton, Charles. (2017). GitHub Java Corpus, 2012 [dataset]. University of Edinburgh: School of Informatics. https://doi.org/10.7488/ds/1690.
The GitHub Java Corpus is a snapshot of all open-source Java code on GitHub in October 2012 that is contained in open-source projects that at the time had at least one fork. It contains code from 14,785 projects amounting to about 352 million lines of code. The dataset has been used to study coding practice in Java at a large scale.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336, VAT Registration Number GB 592 9507 00, and is acknowledged by the UK authorities as a “Recognised body” which has been granted degree awarding powers.