Abstract
Short tandem repeat (STR) mutations may be responsible for more than half of the mutations in eukaryotic coding DNA, yet STR variation is rarely examined as a contributor to complex traits. We assess the scope of this contribution across a collection of 96 strains of Arabidopsis thaliana by massively parallel STR genotyping. 95% of examined STRs are polymorphic, and the median STR has six alleles. Modest STR expansions are found in most strains, some of which have evident functional effects. For instance, three of six intronic STR expansions are associated with intron retention. We infer selective constraint on STRs, and find the strongest signatures of purifying selection on coding STRs. Lastly, we detect dozens of novel STR-phenotype associations that could not be detected with SNPs, and validate some experimentally. Our results demonstrate that STRs comprise a large unascertained reservoir of functionally relevant genomic variation.