Defining a tandem repeat catalog and variation clusters for genome-wide analyses and population databases
- Author(s)
- Weisburd, B; Dolzhenko, E; Bennett, MF; Danzi, MC; Xu, IRL; Tanudisastro, H; Gu, B; English, A; Hiatt, L; Mokveld, T; De Sena Brandine, G; Chiu, R; Kurtas, NE; Jam, HZ; Brand, H; Rajan-Babu, IS; Bahlo, M; Chaisson, MJP; Züchner, S; Gymrek, M; Dashnow, H; Eberle, MA; Rehm, HL;
- Journal Title
- American Journal of Human Genetics
- Publication Type
- Apr 22
- Abstract
- Tandem repeat (TR) catalogs are important components of repeat genotyping studies because they define the genomic coordinates and expected motifs of all TR loci being analyzed. In recent years, genome-wide studies have used catalogs ranging in size from fewer than 200,000 to over 7 million loci. These catalogs differed not only in which loci were included but often also in their definitions of TR locus boundaries and motifs. Now, with multiple groups developing public databases of TR variation in large population cohorts, there is a risk that the use of divergent repeat catalogs will lead to confusion, fragmentation, and incompatibility across resources. Here, we compare existing TR catalogs and discuss desirable features of a comprehensive genome-wide catalog. We then present a new, richly annotated catalog designed for genome-wide analyses and population datasets based on short-read or long-read samples. Additionally, using an algorithm that leverages long-read HiFi sequencing data, our catalog stratifies TRs into (1) isolated repeats suitable for repeat copy-number analysis and (2) variation clusters where TRs are embedded within wider polymorphic regions best studied through sequence-level analysis. We share the TR catalog, variation clusters, and annotations through the TRExplorer portal in order to support both the initial selection of TR loci for inclusion in an analysis and the subsequent interpretation of results.
- Publisher
- Cell Press
- Keywords
- HiFi; STRs; VNTRs; long read; sequencing; short read; tandem repeat catalogs; tandem repeats
- Research Division(s)
- Genetics and Gene Regulation
- PubMed ID
- 42025159
- Publisher's Version
- https://doi.org/10.1016/j.ajhg.2026.03.020
- Terms of Use/Rights Notice
- Refer to copyright notice on published article.
Creation Date: 2026-04-27 03:52:47
Last Modified: 2026-04-27 03:52:54