Bioinformatics; co-auth.: I.Xenarios

Bioinformatics. 2013 Mar 28. [Epub ahead of print]

Density-based hierarchical clustering of pyro-sequences on a large scale – the case of fungal ITS1.


Vital-IT Group, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.



Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy independent, i.e. unsupervised clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have so far largely been overlooked.


Over one million hyper-variable ITS1 sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, that complements the other methods by providing insights into the structure of the data.Availability and Implementation: An executable is freely available for non-commercial users at It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system.






[PubMed – as supplied by publisher]