Viruses.: co-auth.: I.Xenarios

Viruses. 2020 Nov 1;12(11):E1248. doi: 10.3390/v12111248.

Virosaurus A Reference to Explore and Capture Virus Genetic Diversity

Anne Gleizes 1Florian Laubscher 2 3Nicolas Guex 4Christian Iseli 4Thomas Junier 1 5Samuel Cordey 2 3Jacques Fellay 6 7 8Ioannis Xenarios 9Laurent Kaiser 2 3 10Philippe Le Mercier 11

Abstract

The huge genetic diversity of circulating viruses is a challenge for diagnostic assays for emerging or rare viral diseases. High-throughput technology offers a new opportunity to explore the global virome of patients without preconception about the culpable pathogens. It requires a solid reference dataset to be accurate. Virosaurus has been designed to offer a non-biased, automatized and annotated database for clinical metagenomics studies and diagnosis. Raw viral sequences have been extracted from GenBank, and cleaned up to remove potentially erroneous sequences. Complete sequences have been identified for all genera infecting vertebrates, plants and other eukaryotes (insect, fungus, etc.). To facilitate the analysis of clinically relevant viruses, we have annotated all sequences with official and common virus names, acronym, genotypes, and genomic features (linear, circular, DNA, RNA, etc.). Sequences have been clustered to remove redundancy at 90% or 98% identity. The analysis of clustering results reveals the state of the virus genetic landscape knowledge. Because herpes and poxviruses were under-represented in complete genomes considering their potential diversity in nature, we used genes instead of complete genomes for those in Virosaurus.