Covering all your bases: incorporating intron signal from RNA-seq data
Details
Publication Year 2020-09, Volume 2, Issue #3, Page lqaa073
Journal Title
NAR Genomics and Bioinformatics
Abstract
RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general.
Publisher
Oxford Academic
WEHI Research Division(s)
Epigenetics And Development; Personalised Oncology; Blood Cells And Blood Cancer
PubMed ID
33575621
Open Access at Publisher's Site
https://doi.org/10.1093/nargab/lqaa073
Rights Notice
Refer to copyright notice on published article.


Creation Date: 2021-03-04 10:21:23
Last Modified: 2021-03-08 11:25:45
An error has occurred. This application may no longer respond until reloaded. Reload 🗙