Allamanis, Miltiadis; Sutton, Charles. (2017). GitHub Java Corpus, 2012 [dataset]. University of Edinburgh: School of Informatics. http://dx.doi.org/10.7488/ds/1690.
The GitHub Java Corpus is a snapshot of all open-source Java code on GitHub in October 2012 that is contained in open-source projects that at the time had at least one fork. It contains code from 14,785 projects amounting to about 352 million lines of code. The dataset has been used to study coding practice in Java at a large scale.