Brief Bioinform; co-auth I Xenarios

Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees.

Boeckmann B, Robinson-Rechavi M, Xenarios I, Dessimoz C.

Phylogenomic databases provide orthology predictions for species with fully
sequenced genomes. Although the goal seems well-defined, the content of these
databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG,
HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of
reference trees. For three well-conserved protein families, we observed a
generally high specificity of orthology assignments for these databases. We show
that differences in the completeness of predicted gene relationships and in the
phylogenetic information are, for the great majority, not due to the methods
used, but to differences in the underlying database concepts. According to our
metrics, none of the databases provides a fully correct and comprehensive protein
classification. Our results provide a framework for meaningful and systematic
comparisons of phylogenomic databases. In the future, a sustainable set of 'Gold
standard' phylogenetic trees could provide a robust method for phylogenomic
databases to assess their current quality status, measure changes following new
database releases and diagnose improvements subsequent to an upgrade of the
analysis procedure.

PMID: 21737420

