Wong, Mark; Leng, Rhodri; Viry, Gil; Liscovsky Barrera, Rodrigo; Garcia-Sancho, Miguel. (2019). Human, yeast and pig genomics: sequence submissions and first sequence descriptions in the literature (1980-2015), 1980-2015 [dataset]. University of Edinburgh. Science, Technology and Innovation Studies. https://doi.org/10.7488/ds/2718.
This data collection is derived from two sources: 1) Submissions of DNA sequences of S. cerevisiae (yeast), Sus scrofa (pig) and Homo sapiens (human) to the European Nucleotide Archive (ENA), and 2) First description of these sequences in the scientific literature. The time range of the records is 1980-2000 (yeast), 1985-2005 (human) and 1990-2015 (pig). In total, each species has two associated datasets: 1) A .csv file documenting the PubMed ID (PMID) of each article describing new sequences, all paper authors, all institutional affiliations of each author, country of institution, year of first submission to the ENA (when available) and year of article publication; 2) A .csv file documenting all institutions submitting to the ENA, number of nucleotides sequenced and year of submission to the database. While the data about yeast submissions is provided sequence per sequence with full dates and information about both submitting individuals and institutions, the pig and human submission datasets offer aggregate figures per institution and per year. Some submission data is not fully clean. The approximate number of records is 28,000 publications and 13.5 million sequence submissions. The software codes that were used to obtain the submission and publication records can be found at https://github.com/UofGMarkWong/TRANSGENE. A publication describing the data collection and cleaning protocol is available at https://f1000research.com/articles/8-1200. Further information about the project within which this collection was generated: www.stis.ed.ac.uk/transgene.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336, VAT Registration Number GB 592 9507 00, and is acknowledged by the UK authorities as a “Recognised body” which has been granted degree awarding powers.