A comprehensive evaluation of long-read de novo transcriptome assembly
- Author(s)
- Yan, F; Baldoni, PL; Lancaster, J; Ritchie, ME; Lewsey, MG; Gouil, Q; Davidson, NM;
- Journal Title
- Genome Biology
- Publication Type
- Feb 18
- Abstract
- INTRODUCTION: Recently, de novo transcriptome assembly methods have been developed to utilise long-read data in cases where a reference genome is unavailable, such as in non-model organisms. Despite the potential of these tools, there remains a lack of benchmarking and established protocols for optimal reference-free, long-read transcriptome assembly and differential expression analysis. RESULTS: Here, we evaluate the long-read de novo transcriptome assembly tools, RATTLE, RNA-Bloom2 and isONform, and compare their performance to one of the leading short-read assemblers, Trinity. We assess various metrics across a range of datasets, which include simulated data and spike-in sequin transcripts, where ground truth is known, and real data from human and pea (Pisum sativum) samples, using a reference-based approach to define truth. To represent contemporary analysis scenarios, the datasets cover depths from 6 to 60 million reads, Oxford Nanopore Technologies (ONT) cDNA, ONT direct RNA and Pacific Biosciences (PacBio) 10 × single-cell sequencing. Critically, we assess the downstream impact of assembly choice on the detection of differential gene and transcript expression. CONCLUSIONS: Our results confirm that long reads generate longer assembled transcripts than short-reads for reference-free analysis, though limitations remain compared to reference-guided approaches, and suggest scope for improved accuracy and reduced redundancy. Of the de novo pipelines, RNA-Bloom2, coupled with Corset for transcript clustering, was the best performing in terms of both accuracy and computational efficiency. Our findings offer guidance when selecting the most effective strategy for long-read differential expression analysis, when a high-quality reference genome is unavailable.
- Publisher
- Springer Nature
- Keywords
- Differential expression; Long reads; Non-model organisms; Reference-free; Transcriptome assembly
- Research Division(s)
- Bioinformatics and Computational Biology; Genetics and Gene Regulation; Blood Cells and Blood Cancer
- PubMed ID
- 41709347
- Publisher's Version
- https://doi.org/10.1186/s13059-026-04001-5
- Open Access at Publisher's Site
https://doi.org/10.1186/s13059-026-04001-5- Terms of Use/Rights Notice
- Refer to copyright notice on published article.
Creation Date: 2026-03-24 02:09:50
Last Modified: 2026-03-24 02:16:41