Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020
Data CreatorHavens, L
Centre for Research Collections, The
PublisherUniversity of Edinburgh. School of Informatics
Relation (Is Referenced By)http://arxiv.org/abs/2011.05911
MetadataShow full item record
CitationHavens, L; Alex, B; Bach, B; Terras, M; Renton, S; Hosker, R; Centre for Research Collections, The. (2020). Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020, [dataset]. University of Edinburgh. School of Informatics. https://doi.org/10.7488/ds/2953.
DescriptionThe dataset includes metadata descriptions extracted from the Centre for Research Collections' online archival catalog using OAI-PMH EAD harvesting. Metadata descriptions were extracted from four metadata fields: an identifier (<unitid>), Biographical / Historical (<bioghist>), Scope and Contents (<scopecontent>), and Processing Information (<processinfo>). The descriptions were extracted in October 2020. The dataset includes five files that will be annotated for instances of gender bias, in an effort to create a gold standard dataset on which an algorithm can be trained to identify and classify gender bias in text. ## Acknowledgments ## This dataset has been created for a PhD project conducted in collaboration with Beatrice Alex, Benjamin Bach, and Melissa Terras (PhD supevisors); and with Rachel Hosker and the Centre for Research Collections (CRC). This group of collaborators will be involved in future uses of the data as this PhD project continues; specifically, for determining how to annotate the data for gender bias. Thanks are due to Scott Renton for his guidance in using the Open Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH), which was necessary to extract selections of metadata in Encoded Archival Description (EAD) XML format from the CRC's online archives' catalog, ArchivesSpace.
Descriptions from 197 collections ("fonds") that will be used to test a classification algorithm (377.7Kb)
Descriptions from 197 collections ("fonds") that will be used to iteratively refine a classification algorithm (793.3Kb)
Descriptions from 197 collections ("fonds") that will be used to train a classification algorithm (471.1Kb)
Descriptions from 197 collections ("fonds") that will be used to train a classification algorithm (4.528Mb)