Defining a tandem repeat catalog and variation clusters for genome-wide analyses and population databases
Journal Title
American Journal of Human Genetics
Publication Type
Apr 22
Abstract
Tandem repeat (TR) catalogs are important components of repeat genotyping studies because they define the genomic coordinates and expected motifs of all TR loci being analyzed. In recent years, genome-wide studies have used catalogs ranging in size from fewer than 200,000 to over 7 million loci. These catalogs differed not only in which loci were included but often also in their definitions of TR locus boundaries and motifs. Now, with multiple groups developing public databases of TR variation in large population cohorts, there is a risk that the use of divergent repeat catalogs will lead to confusion, fragmentation, and incompatibility across resources. Here, we compare existing TR catalogs and discuss desirable features of a comprehensive genome-wide catalog. We then present a new, richly annotated catalog designed for genome-wide analyses and population datasets based on short-read or long-read samples. Additionally, using an algorithm that leverages long-read HiFi sequencing data, our catalog stratifies TRs into (1) isolated repeats suitable for repeat copy-number analysis and (2) variation clusters where TRs are embedded within wider polymorphic regions best studied through sequence-level analysis. We share the TR catalog, variation clusters, and annotations through the TRExplorer portal in order to support both the initial selection of TR loci for inclusion in an analysis and the subsequent interpretation of results.
Publisher
Cell Press
Keywords
HiFi; STRs; VNTRs; long read; sequencing; short read; tandem repeat catalogs; tandem repeats
Research Division(s)
Genetics and Gene Regulation
PubMed ID
42025159
Terms of Use/Rights Notice
Refer to copyright notice on published article.


Creation Date: 2026-04-27 03:52:47
Last Modified: 2026-04-27 03:52:54
An error has occurred. This application may no longer respond until reloaded. Reload 🗙