Bush, Stephen J. (2018). A corpus of names drawn from the local birth registers of England and Wales, 1838-2014, 1838-2014 [dataset]. University of Edinburgh. http://dx.doi.org/10.7488/ds/2294.
This dataset comprises a corpus of names, in both the first and middle position, for approximately 22 million individuals born in England and Wales between 1838 and 2014.
This data is obtained from birth records made available by a set of volunteer-run genealogical resources - collectively, the "UK local BMD project" (http://www.ukbmd.org.uk/local) - and has been re-purposed here to demonstrate the applicability of network analysis methods to an onomastic dataset. The ownership and licensing of the intellectual property constituting the original birth records is detailed at https://www.ukbmd.org.uk/TermsAndConditions. Under section 29A of the UK Copyright, Designs and Patents Act 1988, a copyright exception permits copies to be made of lawfully accessible material in order to conduct text and data mining for non-commercial research.
The data included in this dataset represents the outcome of such a text-mining analysis. No birth records are included in this dataset, and nor is it possible for records to be reconstructed from the data presented herein.
The data comprises an archive of tables, presenting this corpus in various forms: as a rank order of names (in both the first and middle position) by number of registered births per year, and by the total number of births across all years sampled. An overview of the data is also provided, with summary statistics such as the number of usable records registered per year, most popular names per year, and measures of forename diversity and the surname-to-forename usage ratio (an indicator of which forenames are more likely to be transferred uses of surnames). These tables are extensive but not exhaustive, and do not exclude the possibility that errors are present in the corpus.
Data are also presented both as ‘.expression’ files (an input format readable by the network analysis tool Graphia Professional) and as ‘.layout’ files, a text file format output by Graphia Professional that describes the characteristics of the network so that it may be replicated.
Characteristics of the original birth records that allow the identification of individuals - for instance, full name or location of birth - have been removed.