Requesting data and dealing with complex intellectual property rights issues

September 29, 2011

tags: Data portal, intellectual property rights

Lilla Fargen, Sweden. Image: Wikipedia

Continuing our special feature outlining how BioFresh scientists are collecting and collating a global metadatabase of freshwater biodiversity data, we turn to the often tricky concept of data sharing and intellectual property rights issues. If you have access to academic journals, two great papers on data sharing that may be of interest are: Tenopir et al (2011) “Data Sharing by Scientists: Practice and Perceptions” PLoS One (Free Open Access) and Roberts (2009) “Biodiversity Databases Spread, Prompting Unification Call” in Science.

Authors: Aaike De Wever, Sian Davies, Astrid Schmidt-Kloiber

Evaluation of the entries in the BioFresh metadatabase allows us to select the most relevant databases we want to request for use in the scientific work within BioFresh and for integration in the data portal. One of the main topics to consider is intellectual property rights (IPR).

In the metadatabase, data holders are confronted with the following IPR options:

Database available for: BioFresh scientific purposes/BioFresh data portal (public)

Data can be used without restrictions, but must be publicly acknowledged and cited correctly.
Data provider must be informed of publication 45 days in advance and can object the use of the dataset within 30 days. Data must be publicly acknowledged and cited correctly.
Data provider must be offered co-authorship for publications using this dataset. Data must be publicly acknowledged and cited correctly.
Data cannot be used in publication.
Other/additional criteria (please specify).

Unsurprisingly, the entries in the metadatabase covered the whole range of conditions, ranging from no restrictions – mostly for data that are already available on-line – to co-authorship for scientist’s personal data. However, when I first did this exercise, I was quite surprised to see that a lot of the larger datasets that seemed interesting for integration in the BioFresh data portal had specified “other” and included statements like:

– The data were compiled from different institutes. All requests for access must be directed to …

– Other partners/institutions must ask the data providers of the database if the data can be used.

– The data are available for use at the discretion of the data provider.

– Data and permission of usage can be obtained from the author as long as the user accepts the external copyrights enforced by previous sources (authors/institutions), which protects some of the occurrence data included in the database.

It thus seemed that most of those datasets were compiled by several data holders. As the construction of those datasets is often done in the setting of a collaborative project, little attention is given to their potential use after the termination of the project and at that stage, the institute/person who compiled the dataset doesn’t feel like asking this over again…

Buttermere, one of England's most pristine lakes. Photo courtesy of Jo-Anne Pitt

Gaining permission for intercalibration datasets

In a later stage in the metadatabase, we implemented the option to specify IPR details for multiple data holders independently. But still, requesting permission to use data from those datasets can become fairly complex. This is especially the case for the intercalibration datasets for the Water Framework Directive, where several EU Member States contribute data to the same dataset and each has different views on the availability of data or an existing national data availability policy, and/or because some data has been made available by academics who are also involved in the intercalibration exercise. As this is a major job, we were lucky to get some help from Sian Davies, who had been involved in the compilation of these datasets while working at the Environment Agency for England and Wales. Apart from the difficulty of figuring out the different sources of the data (esp. in the absence of the actual compiled dataset and for countries with different federal states or provinces), she encountered quite a number of obstacles.

Overall, there was quite a high percentage of non-respondants, especially for datasets with complex IPR issues. We can only guess the reasons for this: email sent to spam folder, lost in a mountain of other emails, contact person moved on… But, it is quite likely that often people don’t have time to deal with the request immediately then later forget, or they may not see the point of contacting the original data owners, which is a big job for them and finally they forget about responding to the email, or they may in fact ask the data owners for permission but never get a response and not have time, or inclination to chase the request.

Lake Starnberger, Germany. Image: Wikipedia

Dealing with responses

Once responses are obtained it has been clear that in some cases the individuals have not fully understood the data request. It is difficult to pass on quite a large amount of explanatory information with the request, whilst keeping it concise and in English which most people can understand. However, it should not be taken for granted that all those involved are able to speak or read English fluently and this in itself may have posed some difficulties and might explain some of the non-responses.

Another major difficulty is the fact that an overview of the existing compiled datasets is missing. Combined with the fact that we generally have no access to the datasets, this means that we need to figure out what data exists step by step (and sometimes discover that there’s even more than anticipated e.g. raw data rather than only metrics). The fact that we are unsuccessful in having a look at the databases (even confidentially) is actually quite unfortunate as for the few where this was the case, it clearly helped our understanding of the database, it’s organisation and especially in requesting the data.

Lac d'Allos, France. Image: Wikipedia

Reluctance to share data

Most of the responses received have been positive, although the exact nature of the data availability may still be amended in some cases. In a few cases a response has been negative, but at least the data owner/representative had taken the time to make it clear the data is not available. The reasons are various but range from no response to requests from original data owners (public bodies where bureaucratic processes make it hard to give permission for data access), to complex ownership within one MSs dataset, desire to publish on the dataset before making it public, political reasons or the requirement to receive payment for the data. It is also important to recognise that it has been a long and arduous task for a few individuals to compile these datasets, and that these people have not always been sufficiently remunerated or recognised for their work. This adds even more value to the data, making people reluctant to pass it on.

Other data owners have expressed some reluctance to make the data available, but not given a flat refusal, on the basis that the data has been collected and compiled for a specific purpose meaning that it may not be that suitable for other uses. This at least is the perception. Others have suggested that in the “wrong hands” the data could be used against Member States, especially where national WFD classifications will drive significant investment in improvements which may or may not be welcomed by some parties.

Experiences and conclusions

The IPR specifications and the experiences on requesting data described here mostly apply to entire datasets or at least entire database entries covering both biological and environmental data. For the data portal, we are primarily interested in basic biodiversity data (taxon-occurrences). Although we do not have a lot experience in requesting this type of data yet, in general this possibility is well received and often favoured over making the complete dataset available. This is especially the case when discussing this issue face-to-face.

In conclusion, it is clear that we should really continue to invest in communicating on this issue and convincing data holders to share data. As such we would also strongly recommend any project that involves data compilation to think about future use of the database, include detailed metadata on data holders and rights and consider making their data (or at least the metadata) publicly available.