Data Refinement Workflow


The Taxonomic Data Refinement Workflow (DRW) helps you to efficiently aggregate, integrate, and clean observational and specimen data sets from many different sources. The tool works across large geo-temporal, taxonomic, and environmental scales and prepares your data for use in scientific analyses such as: species distribution analysis, species richness and diversity studies, species distribution modeling, historical analysis, taxonomic revisions, and conservation assessments.

Who is it for?

Scientists and biodiversity managers who need to integrate their own collection, observational, and taxonomic name data with data from distributed biodiversity services.

What is it for?

Retrieve, integrate, clean, and refine species occurrence records as well as associated quantitative information (e.g. biomass).

How does it work?

The workflow consists of three sub-workflows for:

  1. data integration,
  2. data cleaning, refinement, and filtering, and
  3. geographic and/or temporal selection.

The sub-workflows can be executed and repeated in arbitrary order. Being based on standard data formats for input, output, and internal processing, the workflow can be left and re-visited at any point of execution.

Expected results

Enriched, harmonized, and/or filtered datasets, in csv and Darwin core formats.

Links to workflow and user documentation

Workflow on myExperiment

In addition to a regular training manual, we also offer a joint GBIF/BioVeL user documentation specifically designed for GBIF data use.

This workflow can be combined with the Ecological Niche Modelling (ENM) workflows.


Leidenberger et al. (2013) Mapping present and future potential distribution patterns for a meso-grazer guild in the Baltic Sea (

Leidenberger et al. Evaluating the potential of ecological niche modelling as a component in invasive species risk assessments. Ecological Applications. Submitted.

Example of use: a benchmark study

A benchmark comparison exercise was conducted on two datasets with:

  1. historical dataset with 7,400 species observation data from the Swedish West coast, collected between 1921–1938
  2. Recent dataset with 4,100 species observation data from the Swedish West coast, collected between 2003–2009

With a “traditional” Excel-based data refinement technique, it took 38.5 hours to clean these datasets. With the BioVeL DRW, it took 4.25 hours to obtain the same results.




19 February 2015

At the final review of the project by the EC, one of the reviewers said: “Incredible work done with a community that is not unified. Remarkable work. It opens for new development in a near future. Hope for success. Good project. Happy that you have been financed three plus years ago.”

Read all about the project and its results in the Project Final Report or read the Executive Summary only.