Recent CIG publications Archive


Genome Biol.: co-auth.: C.Dessimoz

 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Zhou N1,2Jiang Y3Bergquist TR4Lee AJ5Kacsoh BZ6,7Crocker AW8Lewis KA8Georghiou G9Nguyen HN1,10Hamid MN1,2Davis L2Dogan T11,12Atalay V13Rifaioglu AS13,14Dalkıran A13Cetin Atalay R15Zhang C16Hurto RL17Freddolino PL16,17Zhang Y16,17Bhat P18Supek F19,20Fernández JM21,22Gemovic B23Perovic VR23Davidović RS23Sumonja N23Veljkovic N23Asgari E24,25Mofrad MRK26Profiti G27,28Savojardo C27Martelli PL27Casadio R27Boecker F29Schoof H30Kahanda I31Thurlby N32McHardy AC33,34Renaux A35,36,37Saidi R12Gough J38Freitas AA39Antczak M40Fabris F39Wass MN40Hou J41,42Cheng J42Wang Z43Romero AE44Paccanaro A44Yang H45,46Goldberg T47Zhao C48,49,50Holm L51Törönen P51Medlar AJ51Zosa E52Borukhov I53Novikov I54Wilkins A55Lichtarge O55Chi PH56Tseng WC57Linial M58Rose PW59Dessimoz C60,61,62Vidulin V63Dzeroski S64,65Sillitoe I66Das S67Lees JG67,68Jones DT69,70Wan C71,69Cozzetto D71,69Fa R71,69Torres M44Warwick Vesztrocy A70,72Rodriguez JM73Tress ML74Frasca M75Notaro M75Grossi G75Petrini A75Re M75Valentini G75Mesiti M75,76Roche DB77Reeb J77Ritchie DW78Aridhi S78Alborzi SZ78,79Devignes MD78,80,79Koo DCE81Bonneau R82,83Gligorijević V84Barot M85Fang H86Toppo S87Lavezzo E87Falda M88Berselli M87Tosatto SCE89,90Carraro M90Piovesan D90Ur Rehman H91Mao Q92,93Zhang S92Vucetic S92Black GS94,95Jo D94,95Suh E94Dayton JB94,95Larsen DJ94,95Omdahl AR94,95McGuffin LJ96Brackenridge DA96Babbitt PC97,98Yunes JM99,98Fontana P100Zhang F101,102Zhu S103,104,105You R103,104,105Zhang Z103,105Dai S103,105Yao S103,104Tian W106,107Cao R108Chandler C108Amezola M108Johnson D108Chang JM109Liao WH109Liu YW109Pascarelli S110Frank Y111Hoehndorf R112Kulmanov M112Boudellioua I113,114Politano G115Di Carlo S115Benso A115Hakala K116,117Ginter F116,118Mehryary F116,117Kaewphan S116,117,119Björne J120,121Moen H118Tolvanen MEE122Salakoski T120,121Kihara D123,124Jain A125Šmuc T126Altenhoff A127,128Ben-Hur A129Rost B47,130Brenner SE131Orengo CA67Jeffery CJ132Bosco G133Hogan DA6,8Martin MJ9O’Donovan C9Mooney SD4Greene CS134,135Radivojac P136Friedberg I137.



The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.


Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.


We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.


Biofilm; Community challenge; Critical assessment; Long-term memory; Protein function prediction

PMID: 31744546



Bioinformatics.: co-auth.: I.Xenarios

 2019 Dec 2. pii: btz882. doi: 10.1093/bioinformatics/btz882. [Epub ahead of print]

Incorporating heterogeneous sampling probabilities in continuous phylogeographic inference – application to H5N1 spread in the Mekong region.



The potentially low precision associated with the geographic origin of sampled sequences represents an important limitation for spatially-explicit (i.e. continuous) phylogeographic inference of fast-evolving pathogens such as RNA viruses. A substantial proportion of publicly available sequences are geo-referenced at broad spatial scale such as, for example, the administrative unit of origin rather than more exact locations (e.g. GPS coordinates). Most frequently, such sequences are either discarded prior to continuous phylogeographic inference or arbitrarily assigned to the geographic coordinates of the centroid of their administrative area of origin for lack of a better possibility.


We here implement and describe a new approach that allows to incorporate heterogeneous prior sampling probabilities over a geographic area. External data, such as outbreak locations, are used to specify these prior sampling probabilities over a collection of sub-polygons. We apply this new method to the analysis of highly pathogenic avian influenza (HPAI) H5N1 clade data in the Mekong region. Our method allows to properly include, in continuous phylogeographic analyses, H5N1 sequences that are only associated with large administrative areas of origin and assign them with more accurate locations. Finally, we use continuous phylogeographic reconstructions to analyse the dispersal dynamics of different H5N1 clades and investigate the impact of environmental factors on lineage dispersal velocities.


Our new method allowing heterogeneous sampling priors for continuous phylogeographic inference is implemented in the open-source multi-platform software package BEAST 1.10.


Supplementary data are available at Bioinformatics online and on

PMID: 31790143

Proc Natl Acad Sci U S A.: auth.: group Franken

 2019 Nov 27. pii: 201910590. doi: 10.1073/pnas.1910590116.

Sleep-wake-driven and circadian contributions to daily rhythms in gene expression and chromatin accessibility in the murine cortex.


The timing and duration of sleep results from the interaction between a homeostatic sleep-wake-driven process and a periodic circadian process, and involves changes in gene regulation and expression. Unraveling the contributions of both processes and their interaction to transcriptional and epigenomic regulatory dynamics requires sampling over time under conditions of unperturbed and perturbed sleep. We profiled mRNA expression and chromatin accessibility in the cerebral cortex of mice over a 3-d period, including a 6-h sleep deprivation (SD) on day 2. We used mathematical modeling to integrate time series of mRNA expression data with sleep-wake history, which established that a large proportion of rhythmic genes are governed by the homeostatic process with varying degrees of interaction with the circadian process, sometimes working in opposition. Remarkably, SD caused long-term effects on gene-expression dynamics, outlasting phenotypic recovery, most strikingly illustrated by a damped oscillation of most core clock genes, including Arntl/Bmal1, suggesting that enforced wakefulness directly impacts the molecular clock machinery. Chromatin accessibility proved highly plastic and dynamically affected by SD. Dynamics in distal regions, rather than promoters, correlated with mRNA expression, implying that changes in expression result from constitutively accessible promoters under the influence of enhancers or repressors. Serum response factor (SRF) was predicted as a transcriptional regulator driving immediate response, suggesting that SRF activity mirrors the build-up and release of sleep pressure. Our results demonstrate that a single, short SD has long-term aftereffects at the genomic regulatory level and highlights the importance of the sleep-wake distribution to diurnal rhythmicity and circadian processes.


circadian; epigenetics; gene expression; long-term effects; sleep

PMID: 31776259

Link to the RTS CQFD article and interview about this publication:

Link to the article on the “UNIL Actu” review:


Genome Biol.: auth.: groups C.Dessimoz and Franken

 2019 Nov 20;20(1):246. doi: 10.1186/s13059-019-1828-7.

Structural variant calling: the long and the short of it.


Recent research into structural variants (SVs) has established their importance to medicine and molecular biology, elucidating their role in various diseases, regulation of gene expression, ethnic diversity, and large-scale chromosome evolution-giving rise to the differences within populations and among species. Nevertheless, characterizing SVs and determining the optimal approach for a given experimental design remains a computational and scientific challenge. Multiple approaches have emerged to target various SV classes, zygosities, and size ranges. Here, we review these approaches with respect to their ability to infer SVs across the full spectrum of large, complex variations and present computational methods for each approach.


De novo assembly; Gene fusion; Hybrid; Long-read; Mapping; RNA-Seq; Short-read; Structural variant (SV) detection




Nat Commun.: auth.: group Fankhauser

 2019 Nov 19;10(1):5219. doi: 10.1038/s41467-019-13045-0.

Molecular mechanisms underlying phytochrome-controlled morphogenesis in plants.


Phytochromes are bilin-binding photosensory receptors which control development over a broad range of environmental conditions and throughout the whole plant life cycle. Light-induced conformational changes enable phytochromes to interact with signaling partners, in particular transcription factors or proteins that regulate them, resulting in large-scale transcriptional reprograming. Phytochromes also regulate promoter usage, mRNA splicing and translation through less defined routes. In this review we summarize our current understanding of plant phytochrome signaling, emphasizing recent work performed in Arabidopsis. We compare and contrast phytochrome responses and signaling mechanisms among land plants and highlight open questions in phytochrome research.

PMID: 31745087

Prenat Diagn.: co-auth.: group Reymond

 2019 Nov 17. doi: 10.1002/pd.5589. [Epub ahead of print]



Our goal was to describe and illustrate prenatal cerebral imaging features of the most severe form of a new syndromic entity related to KIAA1109 pathogenic variants based on a retrospective multicentric study of seven cases. All cases demonstrated a similar complex severe cerebral malformative pattern. This pattern included, within the supratentorial space, major cerebral parenchymal thinning with a lissencephalic cortical pattern, voluminous germinal matrices, severe ventriculomegaly, and corpus callosum agenesis. Within the infra-tentorial space, cerebellar hypoplasia was associated with characteristic brainstem dysgenesis including elongation of the pons, as well as a variable degree of kinking of the brainstem. This cerebral pattern, which was suggestive of the more severe phenotypes related to disrupting variants of tubulin-encoding genes, was associated in all cases with clubfoot and/or arthrogryposis, and in most cases with cardiac and ophthalmologic anomalies. In all cases, exome sequencing led to the identification of KIAA1109 pathogenic variants.

PMID: 31736083