Bush, Stephen; Muriuki, Charity; McCulloch, Mary E. B.; Farquhar, Iseabail L.; Clark, Emily L.; Hume, David A.. (2018). Assembly and validation of conserved long non-coding RNAs in the ruminant transcriptome, [dataset]. Roslin Institute. University of Edinburgh. http://dx.doi.org/10.7488/ds/2284.
mRNA-like long non-coding RNAs (lncRNA) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. This dataset demonstrates that few lncRNA are fully captured by biological replicates of the same RNA-seq library. In a transcriptional atlas of the domestic sheep (https://doi.org/10.1371/journal.pgen.1006997), 31 diverse tissues/cell types were sampled in each of 6 individual adults (3 females, 3 males, all unrelated virgin animals approximately 2 years of age). By taking a subset of 31 common tissues per individual, each of the 6 adults (f1, f2, f3, m1, m2, and m3) was represented by ~0.75 billion reads.
In a typical lncRNA assembly pipeline, read alignments from all individuals are merged, to maximise the number of candidate gene models (using, for instance, StringTie --merge). With n = 6 adults (and ~0.75 billion reads per adult), there are (2^n)-1 = 63 possible combinations of data for which GTFs can be made with StringTie --merge. This dataset comprises those GTFs.