Genome Biol Evol.: auth.: group Dessimoz

Genome Biol Evol. 2021 Apr 19;evab077. doi: 10.1093/gbe/evab077. Online ahead of print.

Homoeolog inference methods requiring bidirectional best hits or synteny miss many pairs

Natasha Glover 1 2 3Shaoline Sheppard 4Christophe Dessimoz 1 2 3 5 6Affiliations expand

Free article

Abstract

Homoeologs are pairs of genes or chromosomes in the same species that originated by speciation and were brought back together in the same genome by allopolyploidization. Bioinformatic methods for accurate homoeology inference are crucial for studying the evolutionary consequences of polyploidization, and homoeology is typically inferred on the basis of bidirectional best hit (BBH) and/or positional conservation (synteny). However, these methods neglect the fact that genes can duplicate and move, both prior to and after the allopolyploidization event. These duplications and movements can result in many-to-many and/or nonsyntenic homoeologs-which thus remain undetected and unstudied. Here, using the allotetraploid upland cotton (Gossypium hirsutum) as a case study, we show that conventional approaches indeed miss a substantial proportion of homoeologs. Additionally, we found that many of the missed pairs of homoeologs are broadly and highly expressed. A Gene Ontology (GO) analysis revealed a high proportion of the nonsyntenic and non-BBH homoeologs to be involved in protein translation and are likely to contribute to the functional repertoire of cotton. Thus, from an evolutionary and functional genomics standpoint, choosing a homoeolog inference method which does not solely rely on 1:1 relationship cardinality or synteny is crucial for not missing these potentially important homoeolog pairs.

Keywords: Gossypium hirsutum; best bidirectional hit; comparative genomics; cotton; homoeolog; synteny.