Phylogenetics Workflows


An example of covariation of the distribution of sequence abundances across locations with their phylogenetic relationship. Colors of the branches show their  contribution to beta diversity in the phylogeny while stacked histograms show the abundances across locations (Image powered by ITOL).

© Image courtesy of Saverio Vicario

Phylogeny can be used as a basic tool to summarize biodiversity, categorize groups of organisms and study the impact of environmental change on biodiversity.

Who is it for?

Scientists interested in performing phylogenetic inference and to perform phylogenetic diversity analysis. Students who seek for a low-threshold solution to perform phylogenetic inference tasks.

What is it for?

The BioVeL phylogenetic service set allows performing phylogenetic inferences for systematics research.

Several services offer Bayesian methods to guide the selection of the evolutionary model and performing a post hoc validation of the inference. The set further includes the phylogenetic partitioning of the diversity across samples, allowing the study of mutual information between phylogeny and environmental variables.

Other services provide an integrated solution to the phylogenetic inference of large time calibrated trees (SUPERSMART services). To this end, public data resources are mined for suitable molecular sequence data, which is subsequently processed to form a starting point for phylogenetic inference using Bayesian and Maximum-Likelihood methods. Inferred trees can then be time-calibrated using fossil data.

Furthermore, services for the enrichment and conversion of data resources in standard data formats for phylogenetic analysis are available (nexml service set). These services can be integrated into phylogenetic inference workflows.

How does it work?

Phylogenetic inference is performed using the MrBayes software and checking MCMC convergence of the tree parameter (powered by GeoKS) and evaluating post hoc the fit of the model with a posterior predictive test.

Three variants of the bayesian phylogenetic workflow exist, that differ on the mode to define model of substitution:

  • Automated partitioned model definition using PartitionFinder
  • Guide the user in the choice of partitioned model using a graphic user interface
  • User specified Nexus file with included full model descriptions

The phylogenetic partitioning of the diversity across samples is done using the phylogenetic entropy proposed by Chao, Chiu and Jost (2010) and equating beta diversity to mutual information between species and environment vectors. The method is implemented with a python script.

The SUPERSMART services provide functionality to infer phylogenies from a given list of taxa of interest. A set of DNA sequences for phylogenetic inference is then assembled by querying the GenBank database. To tackle the computationally challenges of inferring trees with hundreds or thousands of taxa, phylogenetic inference is accomplished in a multi-step procedure using the Maximum-Likelihood estimaton implemented in ExaML and the multi-species multi-locus coalescent approach implemented in *BEAST. Computations are performed in parallel on the Naturalis biovel server (

The nexml services allow users to 1) enrich and combine datasets encoded in standard phylogenetic data formats (e.g. Newick, NEXUS, ...) into an integrated NeXML representation and 2) to extract subsets of the data from NeXML documents.

Expected results

Phylogenetic inference:

  • Phylogenetic Tree diagrams,species-level chronograms
  • Posterior probability distribution of the evolutionary modelled
  • Posterior predictive probability of a good fit of the model
  • Sets of aligned DNA marker sequences for given taxa of interest

Phylogenetic diversity partitioning:

  • HTML report including Table of gamma, alpha and beta phylogenetic diversity and entropy, and graphical overlook of the contribution of each branch to the overall phylogenetic beta entropy
  • XML representation of the table
  • PhyloXML or NeXML representation of the tree and the branch beta contribution.

Data enrichment and extraction services:

  • Documents encoded in standard (phylogenetic) file formats: NeXML, NEXUS, FASTA, PHYLIP, Newick, Stockholm

Links to workflow and user documentation

Bayesian Phylogenetic inference:

Submit workflows:

Evaluate and retrieve workflow:


Alignment with MSAPAD:

NeXML parser and coder:


NeXML services are also described here:
Vos et al. Enriched biodiversity data as a resource and service, Biodiversity Data Journal 2: e1125 (16 Jun 2014), DOI: 10.3897/BDJ.2.e1125,

Example of use

In collaboration with ZooplantLab, Università di Milano "Bicocca", BioVeL partners looked at the relationship of gut microbiomes in 22 pairs of host-parasites (apis mellifera - honey bees, and Varroa destructor - a parasitic mite), across seven beehives in Northern Italy. Strong similarities were found within each pair and within beehives, noting that few lineages of bacteria remain unique to either bees’ guts or mite guts.


Sandionigi A, Vicario S, Prosdocimi E, Galimberti A, Ferri E, Bruno A, Balech B, Mezzasalma V, Casiraghi M. "Toward a better understanding of Apis mellifera and Varroa destructor microbiomes: introducing PhyloH as a novel phylogenetic diversity analysis tool". Molecular Ecology Resources. (In press)



Schematic of one of the variant of the phylogenetic inference workflow




19 February 2015

At the final review of the project by the EC, one of the reviewers said: “Incredible work done with a community that is not unified. Remarkable work. It opens for new development in a near future. Hope for success. Good project. Happy that you have been financed three plus years ago.”

Read all about the project and its results in the Project Final Report or read the Executive Summary only.