Skip to content

Science made easier : Darwin Core explained

July 3, 2012

A key barrier to data publishing – making data available for others to use – is the simple reality that most people have devised their own terms and labels to order their data sets. Wouldn’t life be easy for those creating, managing and using data portals if we all used the same set of terms to describe our data? This is the purpose of the Darwin Core Standard, a seriously useful and authoritative output of the Taxonomic Database Working Group (TDWG) of the International Union of Biological Scientists.

The Darwin Core is set to become the ‘industry standard’ for the field of Biodiversity informatics. It comprises a list of terms and technical descriptions relating to attributes of species and distributional data. The latest version is comprehensive and arises from an iterative process started in 2009 and guided by the principle of “keeping the standard as simple and open as possible and to develop terms only when there is shared demand”. Adopting the standard not only means that data can be upload into important biodiversity data portals such as GBIF and BioFresh, but it also provides an invaluable prompt when designing new databases. John Wieczorek and colleagues have published an excellent overview of both the standard and its applications in PLoS One, and full details of the technical aspects are available from the TDWG web-site.

However, if you are just looking for a quick introduction we suggest you check out the two videos below. In the first, two robots in bar talk about the principles of the Darwin Core (like Robots would!)..

and in the second David Remsen helpfully walks viewers through the Darwin Core Archive Assistant which is an on-line tool to assist in the publication of biodiversity data.

Citation: Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, et al. (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715

What does a Data paper look like?

June 29, 2012

Making datasets discoverable through the metadatabase and publishing them on-line is one of the main aims of the BioFresh project. Pensoft Publishers recently started calling for data papers based on primary biodiversity datasets published to GBIF (Penev et al. 2011). BioFresh partners have several in preparation and the practice of publishing data papers seems set to become normal practice.

So what is a ‘metadata paper’ or ‘database paper’? Well as the terms suggest it is a paper that focuses on the description of a database. Such papers could be conceived as either a pure description of the dataset for publication in a specialized journal or as a more extensive scientific article giving a broader insight in the database which might be targeted at a regular scientific journal. A “pure” data paper might be limited to an abstract published in a scientific journal together with descriptive and technical metadata. In such cases the actual data files would be made available on-line, as is the case of the papers in the Ecological Society of America’s Ecological archive, or via data portals such as GBIF. However, it is expected that in addition to describing the data content, data papers will include sections summarizing the history of the data set (e.g. original purpose, mode and time of generation, funding body etc.) and its perceived value and usefulness to scientific research (fundamental and/or applied). (see two examples below).

If you want to get into data publishing, and we encourage you to do so, nice examples of data papers are by Jones et al (2009) on mammals and Brose etal (2005) on body sizes. Pensoft Publishing has produced useful data publishing policies and guidelines and GBIFs integrated publishing tool (IPT) offers a facility to generate a draft paper outline containing the metadata information of the dataset. BioFresh is currently adding similar export functions to the BioFresh metadatabase.

Example Data Papers
Jones et al. (2009). PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. (W. K. Michener, Ed.) Ecology, Ecological Archives E090-184, 90(9), 2648–2648. Ecological Society of America.
Brose et al. (2005). Body sizes of consumers and their resources. Ecology 86:2545.
Penev, L., Mietchen, D., Chavan, V., & Hagedorn, G. (2011). Pensoft Data Publishing Policies

Freshwater Journals Unite to Boost Primary Biodiversity Data Publication

June 26, 2012

The latest issue of BioScience carries a viewpoint letter from the Biofresh data management ‘team’ announcing an exciting new initiative to encourage the publication of the data analyzed and reported in scientific papers. As a result of Biofresh efforts, 17 journals publishing on Freshwater Biodiversity have agreed to include in their guidelines for authors the statement that “Authors are encouraged to place all species distribution records in a publicly accessible database such as the national Global Biodiversity Information Facility (GBIF) nodes or data centers endorsed by GBIF, including BioFresh“.

Lead author Aaike De Wever, data manager of the Biofresh project explains the significance and origins of this initiative:

“One of the major goals of BioFresh is to make freshwater biodiversity data open and freely available. This will allow broader-scale analyses that will open new frontiers in freshwater science to support enhanced freshwater biodiversity policy and management. As data publishing practices are not yet well established in the freshwater community, we are trying to convince scientists to make their data available through various means. This includes the promotion of data papers, and engaging with funding agencies and scientific journals to encourage data publication.

Last year, during the Seventh Symposium for European Freshwater Science (SEFS7) in Girona, we had the opportunity to convene editors from 12 scientific freshwater journals to explore their role in biodiversity data mobilization. During this meeting we stressed the need to bring primary biodiversity data on where, when, how and by whom species have been observed or collected available to other scientists and discussed the role of journals to encourage data publication or submission. Subsequently editors and publishers of the represented journals as well as a number of additional journals approved inclusion of a statement in their author guidelines (above) encouraging data publication”.

The participating journals are Aquatic Botany, Aquatic Conservation: Marine and Freshwater Ecosystems, Aquatic Ecology, Aquatic Sciences, Ecology of Freshwater Fish, Freshwater Biology, Freshwater Reviews, Fundamental and Applied Limnology, Hydrobiologia, Inland Waters, International Review of Hydrobiology, Freshwater Science (formerly, Journal of the North American Benthological Society), Journal of Fish Biology, Journal of Limnology, Journal of Plankton Research, Limnetica, Limnologica, Marine and Freshwater Research, and River Systems.

In contrast to other initiatives on making data publicly available, BioFresh is specifically targeting primary biodiversity data, which is limited to a standard set of fields, much like the format of GenBank, allowing direct integration in large-scale datasets. Detailed instructions on how to submit data can be found at this link.

Climate change induced changes in fish populations due to differences in species composition among nested assemblages: new Biofresh study

June 21, 2012

Since the early 1990s scientists have been working to enhance the impact and efficiency of site-based conservation approaches. The field of systematic conservation Planning (1) is guided by the so-called ‘representation principle’, an influential policy goal formulated by IUCN ecologist Raymond Dasman in 1972 (2) and simply stated as “The creation of world-wide network of natural reserves that encompass within their boundaries the variety of species and habitats found on earth”. Initially the scientific focus was on developing principles and tools to optimize reserve network design assuming a largely static biota (e.g. the MARXAN conservation planning software). The new scientific frontier in conservation planning is about taking into account the changes in species distribution and occurrence in response to climate and other environmental change. This is so that ‘long-term persistence’ can be incorporated into reserve system design. Needless to say conservation planning for freshwater biodiversity under changing environmental conditions is particularly challenging given the fluid and dynamic nature of freshwater systems!

Writing in the May issue of Global Ecology and Biogeography, BioFresh team member Clément Tisseuil and colleagues add a significant new dimension to our ability to predict how assemblies of fish species (termed beta diversity) will change in time and space in response to climate change. They applied two concepts in biogeography to explore and project the future distributions of 18 fish species for the 2010-2100 period based on data from 50 sites in the Adour-Garone River Basin in France.


The Tarn River France by By Thomas Rosenau [CC-BY-SA-2.5 (http://creativecommons.org/licenses/by-sa/2.5)%5D

The first concept of species turn-over seeks to understand how some species may be replaced by others under different scenarios. The second, and slightly more difficult concept of nestedness, refers to how an ecological system is organized. So for instance, we might prioritize a site for conservation based on the richness of the species assembly (Alpha diversity), however it is vital to know the extent to which other less species rich sites contain sub-sets of the species of the rich sites (degree of nestedness). The concept of nestedness thus enables scientists to identify the processes that lead to species loss or gain in sites .

The significance of this new study is that it is the first to take nestedness fully into account when projecting changes (differences) in fish assemblages at different places and at different times along a river gradient .

Commenting on the significance of the research Clement Tisseuil notes “We showed that the composition of local fish assemblages will greatly change over the 21st century, but this is consistent with previous studies of fish faunas. Our contribution is to distinguish between the turnover and nested components of fish diversity and how these shape the processes that lead to changes in fish species assemblies over time and space. Our key finding was that changes in species composition projected in upstream and downstream sites were mainly caused by differences in species richness among nested fish assemblages, whereas those projected in midstream sites were almost entirely caused by a process of species turnover ”

Literature:
Tisseuil1, C., Leprieur, F., Grenouillet, G., Vrac, M & S. Lek (2012) Projected impacts of climate change on spatio-temporal patterns of freshwater fish beta diversity: a deconstructing
approach Global Biogeography and Ecology: 773

(1) Margules, C.R.& R. L. Pressey (2000) Systematic Conservation Planning. Nature, 405, 243-253.

(2) Dasmann, R.F. (1972) Towards a system for classifying natural regions of the world and their representation by national parks and reserves. Biological Conservation, 4, 247-255.

The Arrival of Data Journals: an interview with Lyubomir Penev of Pensoft Publishers

June 19, 2012

A reality of 21st century science is that publications and citations are key metrics used to evaluate the performance and impact of scientists. Until recently, there has been no incentive for work -pressured scientists (other than good will) to invest time in preparing their data sets so they can be shared and used by others. With the launch of journals specializing in the publication of data papers this looks set to change. Data papers, can perhaps be compared to those reporting a new taxon: they will have a standard format and users of the data set will cite the data paper in a similar manner to how scientists cite the authority (descriptor paper) when using a scientific name. In this way, scientists contributing data to the common scientific endeavor will gain a publication credit and the number of citations will provide a measure of the scientific value of the data.

Lyubomir Penev of Pensoft Publishers has launched several innovative journals in biodiversity science, including an established infrastructure for publishing and dissemination of biodiversity data, and he kindly provided the following perspectives on the value and importance of data journals.

Biofresh Blog: What motivated you to launch a family of innovative journals for the publication and dissemination of biodiversity information?

Lyubomir Penev: The main motivation is perhaps that, as a biodiversity scientist, I have often been disappointed with the speed and manner with which conventional journals handle manuscripts and data. I was even more disappointed with the dissemination of published results, which are often hidden behind a pay-wall barrier with restrictions for copyright and use. Our journals build on three important pillars, namely open access, high-tech XML-based editorial workflow, and active dissemination of the results we publish for our authors.

BB: Why do you think scientists should make the effort to submit data papers: what’s in it for them?
LP: There are many benefits here and they are certainly not restricted to the authors of data papers alone. First, data collectors, managers and authors will be properly credited through a permanent scientific record, priority registration and citation of the data paper. Second, the extended metadata associated with a data set will be properly described and published in order to make data easy to share, use and re-use for others scientists. Sharing data will open new perspectives for collaboration with other scientific groups and institution. Last but not least, re-use of original and collated data sets will tremendously increase the efficiency of public funds investments in gathering all these data!

BB: To what extent do you think data journals will change the way we do Science?
LP: The change will be dramatic and extremely useful in my opinion. The appearance of new data visualization and analysing tools will lead to an ever increasing interest in inter-operability and collation of data with compatible data gathered by other groups. This should provide exciting new views and produce better proven scientific results.

The titles of journals in the Pensoft family include ZooKeys (systematic zoology, phylogeny and biogeography), PhytoKeys (systematic botany), NeoBiota (alien species), and Nature Conservation. A similar initiative is Dataset Papers in Ecology.

Saving Biodiversity Data

June 14, 2012

With the strap-line ‘Biodiversity Needs Data’, reBIND is a fantastic new initiative of the Botanic Gardens and Museum in Berlin funded by the German Research Foundation (DFG) that responds to the reality that many scientists collate valuable biodiversity datasets which are then stored on their personal hard-drives or archived on media which become out of date (remember Zip drives!). As a result, important ‘legacy’ data sets are being lost, or at risk of being lost, as scientists retire, change office or clear out their lofts.

The aim of reBIND is simple: to provide the tools to integrate isolated databases into an institutional data curation strategy. To this end, the reBIND team is developing workflows that combine software tools for transforming data stored in outdated database systems into well-documented, standardized, and commonly understood XML-formats with a system for storing, documenting, and publishing the information as a web service.

It’s a neat idea and wonderfully explained in this innovative and engaging video.

Special Feature: Developments in Biodiversity Data Publishing

June 14, 2012

Knowledge on the status and distribution of biodiversity is fundamental both to the delivery of key conservation conventions and the development of effective policy planning and management. Unfortunately the temporal and spatial resolution of available biodiversity data currently falls well short of what is needed. This data shortfall is constraining the ability of conservation science and management to effectively embrace important concepts of adaptive management, subsidiarity, participation as well as big new policy frames such as the ecosystem approach.

BioFresh is part of a wider scientific endeavor working to improve geographic databases on the distribution of biodiversity on Earth. This involves creating a new digital architecture of data platforms and portals to pull together and make accessible biodiversity data-sets languishing on servers in research institutes and on the hard-drives of people’s personal computers. Needless to say achieving this vision is not exactly a walk in the park! In an earlier Special Feature on Assembling the Freshwater Database, BioFresh scientist, Aaike De Wever introduced the challenges involved in setting up a system of interoperable databases and explained some key terminology: metadata, inter-calibration and so forth. In this special feature we continue this theme by presenting an overview of some important, and we think exciting, advances in data publication.

Over the next 3 weeks we will run the following series of posts reporting on important new developments, new projects, and explaining terms. If you think there are other important aspects of data publishing that we should cover please let us know. We also invite you to add your comments to amplify and extend each post so as to make this special feature as useful as possible.

We hope that you will find this special feature interesting and would be grateful if you could let your colleagues and students know if its existence.

 
Paul Jepson & Aaike De Wever

Posts:

Saving Biodiversity Data, an introduction to the ReBIND project.

The Arrival of Data Journals including and interview with Lyubomir Penev of Pensoft Publishers.

Freshwater Journals Unite to Boost Primary Biodiversity Data Publication reports an important new agreement on data publishing.

What does a Data paper look like? outlines the structure and content of a typical data paper

Science made easier : Darwin Core explained introduces this important standard which is helping overcome key barriers to data publishing.

What is a digital object identifier? explores how this can be applied to data sets.

The effect of dams on fish biodiversity: A global view.

June 11, 2012

A salmon jumping up a waterfall in Canada. Diadramous fish species often face mortality or reproductive failure when their migratory route is obstructed by dams. Image: Jerome Charaoui.

The world is currently facing a freshwater biodiversity crisis and the key to preventing further extinction lies in understanding all the threats facing aquatic habitats.  Global freshwater habitats are losing biodiversity faster than terrestrial or marine areas, but so far they are the least well understood.   Amongst the threats to freshwater species, including climate change and pollution, the most difficult to quantify are man-made obstructions to water flow.  Dams can be found in every major biosphere, but very little is known about the effect of river obstruction on freshwater biodiversity, especially on a global scale.

Damming a river has a variety of effects on the freshwater ecosystem, more than just altering the flow from A to B.  Dams create calm bodies of water, changing overall temperature regimes and sediment transport, leading to conditions which tend to favour generalist species. Loss of specialist species, particularly endemics, changes the community structure and leads to biotic homogenization.  A dam will withhold sediment in the reservoir, not just decreasing the amount of substrate available to local freshwater species, but even impacting diadromous, estuarine and marine species much further downstream.  The competition between resident species for food and breeding sites will increase as damming isolates populations, and perhaps more importantly, damming completely restricts migratory fish species.  Isolation may lead to decreases in genetic diversity and therefore puts species at greater risk from disease.  All of these effects may be exacerbated by changes in the surrounding land use.  Overall, damming river flow will lead to both a loss of native species, but also an increase in exotic species which are more likely to become established in degraded habitats. For this reason, dams are one of the greatest global threats to freshwater biodiversity.

A lack of data on global freshwater fish distributions has restricted a thorough investigation of the dam-related threats to fish species.  However, a recent publication by Liermann et al maps global dam obstruction, identifying areas and taxa at risk of species loss.  This is the first paper to quantify this in a manner which could be useful for future planning and management.  Liermann et al quantify and map dam obstruction in all of the world’s main freshwater ecoregions (397).  These are areas that contain geographically distinct groups of freshwater communities.  This data was integrated with fish distribution data, particularly focussing on the numbers of obligate diadromous species (species demonstrating migratory behaviour between freshwater and marine habitats, such as the Salmonids) and ecoregional endemics.  These species were used as a measure of potential species loss in relation to dam obstruction. The model also included an assessment of the difference between dam impact in an undisturbed landscape and the compounded effect of dams and land-use change.  The distribution of areas predicted to be most affected by climate change were also compared with the distribution of the most heavily obstructed ecoregions.

Liermann et al produce the first comprehensive set of maps for highlighting habitat fragmentation by dam obstruction and the corresponding taxa and ecoregions most at risk of species loss. This information is vital for providing information needed to identify the freshwater systems most in need of protection.  18 ecoregions were identified for which less than 50% of the freshwater systems were free flowing.  These areas include central and southern Iberia, the Mississippi, the Indus Basin, and the Murray Darling.  Some of these areas also face high levels of landscape alteration.  Liermann et al identify 8 ecoregions which may be facing the highest level of threats to species loss, but would therefore benefit the most from conservation and restoration projects.  These ecoregions also have a high number of endemic and diadromous species, the loss of which would have effects on even global biodiversity!  The study provides a gradient score for each ecoregion, from freshwater habitat restoration to conservation (at the least affected end).  This will allow the global community to prioritize both management and research efforts for more efficient prevention of the current freshwater biodiversity loss.

Arapaima – another reason to be concerned about the Brazilian forestry bill

May 22, 2012

This afternoon WWF, Greenpeace and key Brazilian organizations will be mounting a Twitter campaign to urge Brazil’s President Dilma Rousseff to veto the Forest Code Approved by Congress in April. The bill has been condemned by WWF on three key grounds: a) millions of acres forest illegally cleared prior to 2008 will be legalized through amnesty, b) landowners could be allowed to reduce the obligatory required forest cover from 80% to 50%, and c) large areas of floodplains and other sensitive areas will be opened to cattle ranching and farming. Writing in the Guardian, John Vidal reports that critics of the bill say it could lead to the loss of 220,000 square kilometres of Amazonian rainforest, an area close to the combined size of the UK and France.

As our small contribution to the debate, today we added the fantastic Arapaimas to the BioFresh Cabinet of Freshwater Curiosities. Guest curator, Daniel Gurdak, profiles these ancient, armored freshwater giants that ply the rivers and floodplains of the Amazon. The Arapaima’s, riverine habitats are susceptible to forest clearance and this group of river giants remind us that future generations may judge today’s decision makers harshly if incredible life forms like this are lost from the Earth.

Arapaima sp. from Guyana. Image: D.J. Stewart[/caption]

More than 1.5 million people have already petitioned President Dilma and the number is expect to rise with this afternoon’s twitter campaign (#vetatudodilma #SOSBrazil) which is encouraging others to sign-up to the Avaaz petition

Latest research underlines the impact of three major threats to all Amphibian species

May 14, 2012

An adult male Ecnomiohyla rabborum, a species ravaged by chytridiomycosis in its native habitat. Image: Brian Gratwicke

A series of recent papers in Nature frame the key threats to amphibian species on the global scale.  IUCN classifies 30% of all amphibian species as threatened and establishing the cause of this trend is pressing priority for conservation science.

Widespread declines in amphibian populations were first noticed in the 1980s.  Habitat degradation through pollution, human land-use and climate change were initially identified as causal factors, but recent papers give more attention to the fungal disease chytridiomycosis.  This is caused by Batrachochytrium dendrobatidis and occurs mainly in cooler regions, with varying virulence in different species.  The disease was first discovered in amphibians in 1998 and is widespread: it is known to have caused local extinction in some frog species.  A current chytridiomycosis global pandemic is underway and may be responsible for many species becoming critically endangered. However, little is known about the overall effects of these major threats (chytridiomycosis, climate change and habitat degradation) and how they could interact to further endanger the global amphibian population.

Hof et al publish a letter in Nature addressing the issue of understanding how the fungal disease pandemic, climate change and land-use change are affecting amphibians worldwide.  In a model which takes into account the spatial distribution of these three threats, the interactions between them and the global distribution of all amphibian species, they predict that species in different regions will face varying levels of each threat, often not simultaneously.  All three orders of amphibian were included in this model, frogs, salamanders and caecilians, and the outlook for all is fairly poor.

Hof et al predict that by 2080 over half of the species in tropical regions (with the greatest amphibian diversity) will be facing drastic declines due to both climate change and habitat degradation.  The occurrence of chytridiomycosis will become more concentrated in temperate and mountainous areas.  What is most worrying is that the spatial distribution of these declines is very widespread and not particularly overlapping.  For example, amphibians in tropical areas such as Africa and South America will be most negatively impacted by climate change, but not so threatened by the disease pandemic.  Overall, more than half of the total geographic distribution for frogs, salamanders and caecilians will be highly affected by the three main threats.

Our beleaguered amphibian species are facing accelerating rates of decline over the next few years, Hof et al predicting that the interaction of climate change, disease and habitat degradation is far more damaging than each threat alone.  Amphibians can be found in almost every terrestrial habitat (apart from Polar Regions) and in some ecosystems are important apex predators.  They also play an important role in linking terrestrial and freshwater habitats in both tropical and temperate zones. It is important that future conservation takes into account all threats to amphibian species before making decisions on how best to ameliorate population decline.