Genome Biol Evol.: co-auth.: C.Dessimoz

Abstract

Taxonomically restricted genes (TRGs) are genes that are present only in one clade. Protein-coding TRGs may evolve de novo from previously non-coding sequences: functional ncRNA, introns or alternative reading frames of older protein-coding genes, or intergenic sequences. A major challenge in studying de novo genes is the need to avoid both false positives (non-functional open reading frames and/or functional genes that did not arise de novo) and false negatives. Here we search conservatively for high confidence TRGs as the most promising candidates for experimental studies, ensuring functionality through conservation across at least two species, and ensuring de novo status through examination of homologous non-coding sequences. Our pipeline also avoids ascertainment biases associated with preconceptions of how de novo genes are born. We identify one TRG family that evolved de novo in the Drosophila melanogaster subgroup. This TRG family contains single copy genes in D. simulans and D. sechellia. It originated in an intron of a well-established gene, sharing that intron with another well-established gene upstream. These TRGs contain an intron that pre-dates their ORF. These genes have not been previously reported as de novo originated, and to our knowledge they are the best Drosophila candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes.