Working with Intercalibration Datasets

September 20, 2011

tags: Communication, Data portal, datasets, Intercalibration

: Wastwater, one of England’s most pristine lakes and typical of those assessed by the Northern Lakes Geographical Intercalibration Group. Image courtesy of Wikimedia

Continuing in our weekly series of posts on the construction of the BioFresh metadatabase, this week Sian Davies takes us through the complexity, importance and challenges of incorporating Intercalibration datasets into the BioFresh project.

Why are Intercalibration datasets important for BioFresh?

Another source of high quality freshwater biodiversity data is held within the datasets compiled as part of the Intercalibration process, which is used to compare Water Framework Directive biological classification tools. The datasets hold carefully selected taxonomic and environmental data from many European countries and thus form a valuable contribution to freshwater biodiversity data.

What are Intercalibration data?

These datasets have been compiled as part of the process by which biological classification tools used for the implementation of the Water Framework Directive (WFD) are compared and adjusted to ensure all Member States show the same level of ambition in their ecological assessments. Most Member States have different WFD classification tools and one way in which they can be compared is to apply them all to a common dataset which comprises some data from each Member State. These datasets have been compiled using data contributions from most participating countries. They contain quantitative taxonomic data and usually supporting environmental data and physical information on the water bodies. They contain data from water bodies of specific types (based on WFD typologies). Considerable time and effort has been put into their compilation, data cleaning and taxonomic harmonisation by members of the GIGs. Data has been provided variously from state monitoring programmes, university research groups and other institutes and may or may not be publicly owned or funded.

Organisation of Intercalibration data

Classification tools have been developed for the following Biological Quality Elements (BQE): phytoplankton, macrophytes, phytobenthos, benthic fauna and fish, variously for lakes and rivers. Because freshwaters vary in character so much across Europe, from the Atlantic coastal to northern Scandinavian, Alpine and Mediterranean, it is necessary to make sure the comparisons are like for like. To achieve this Member States have been assigned to Geographical Intercalibration Groups (GIGs) based on ecoregions. For example the Northern GIG covers the UK, Norway, Sweden and Finland. Other GIGs include the Central Baltic GIG, the Alpine GIG, the Mediterranean GIG and the Eastern Continental GIG. Each GIG is split by BQE and surface water category (lakes, rivers and also transitional and coastal waters, although these are not relevant to BioFresh). So, for example, within the Northern GIG there are lakes groups working on phytoplankton, macrophytes and benthic fauna. These combinations of GIG, water category and biological quality element lead to the potential for many theoretical datasets. To complicate the issue further, some BQEs are dealt with at the cross-GIG level, i.e. all GIGs of a similar water category working as one, e.g. fish, phytobenthos, large rivers and to some extent river macrophytes. Also, whilst some GIGs have collected raw biological data against which to compare the classification tools, others have compiled datasets only of metrics (the outputs from the classification tools) because they have used a different method of intercalibration. Furthermore, some GIGs are well advanced in their work, others less so and some have large, complex datasets and others have small, simple datasets, and others have yet to compile their data. Some GIGs have merged with the WISER project and their data and that of WISER have become more or less one and the same. Many of the GIG datasets are still in use for Intercalibration purposes.

So, there are very many theoretical datasets, some contain data from maybe 3-4 Member States, others from well over 10 Member States. Unfortunately there is no central catalogue of these datasets, so working out what is contained within any of them is not so easy.

Compiling information on Intercalibration datasets

Each GIG is run by a lead person who may or may not be the person who deals with the data compilation. Requests were sent to each GIG lead asking that a) they complete the metadata on BioFresh for their GIG’s dataset and b) they pass the information to the GIG members as a first step in seeking permission for use of their data. As there is no central documentation of these datasets this was the only way to find out which datasets existed and what they held (apart from the ones that I was involved in compiling during my involvement in intercalibration). Unfortunately the level of cooperation from the GIG leads was disappointingly low, so this first crucial step in the process didn’t produce the information hoped for. Since then I have found other ways of working out what some of the datasets hold and have contacted most Member States’ data contacts at least once requesting access to the data. This in itself is no easy task as quite often a Member State has a different representative in each GIG, and possibly different data owners contributing to each GIG. Some detective work is required to work out who is who, who has contributed what and who actually has the authority to decide whether the data can be used by BioFresh. So far almost 150 GIG representatives have been contacted.

Seeking permission to use the Intercalibration datasets

The explanation of the Intercalibration datasets is fairly complex and seeking permission from each data owner is more difficult still. However, some progress is being made and the intellectual property rights (IPR) are more or less understood for most or part of all of the bigger datasets, which are generally in the Northern or Central GIGs and include the cross-GIG fish and large rivers datasets. Metadata has been partially completed for the datasets I have been given in trust, but there are still datasets, such as the cross-GIG fish data, which I haven’t been able to obtain or persuade the data custodians to complete the metadata. This is despite the fact that some contributors to each of these datasets will allow their data to be used by BioFresh.

Given the complex data ownership associated with all the Intercalibration datasets and the difficulties faced with regard to identifying datasets, dataset contents, data contributors and data owners, it has taken a long time to get to the point where we have most of this information for even some of the datasets. In view of this, progress so far should be considered promising and already some of the larger, data rich datasets are well understood with regard to data content and permissions.

In next week’s blog post we focus on the process of requesting data and the issues that arise when dealing with intellectual property rights.