BMC Genomics.: co-auth.: C.Dessimoz

 BMC Genomics. 2020 Nov 18;21(Suppl 10):779. doi: 10.1186/s12864-020-07011-0.

A generalized Robinson-Foulds distance for labeled trees

Samuel Briand 1Christophe Dessimoz 2 3 4 5 6Nadia El-Mabrouk 7Manuel Lafond 8Gabriela Lobinska 9

Abstract

Background: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc).

Results: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees.

Conclusions: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf .

Keywords: Edit distance; Labeled trees; Robinson-Foulds; Tree metric.