What is a Digital Object Identifyer?
A guiding principle of science is that we never use another’s work without giving appropriate citation. The developments in data publishing profiled in this blog series are a consequence of web and computing technologies: effective data-sharing therefore requires a means to cite objects that exist in the digital medium. Digital objects – documents, images, data files etc.- are in effect ‘housed’ and an internet address (url)> However, their location on a web-site may be moved and internet addresses are constantly being closed down, changed, or created. To address this challenge, and the need for a citation format that aligns with the ‘cut and paste’ and ‘click and link’ practicse of the internet, the Association of American Publishers and the Corporation for National Research Initiatives conceived the DOI system.
The idea is elegantly simple. An organization – e.g. the International DOI Foundation – manages a central repository (or dorepository). Owners register their digital objects on the directory using a globally available system of character strings (the object ‘handle’ or “digital identifier”). This DOI is a permanent whereas its internet address and other data associated with the object may change. The handle system (basically software protocol that underlie the operation of the internet) is used to resolve ‘handles’ into the “information necessary to locate, access, contact, authenticate, or otherwise make use of digital resources”. Lastly, a Digital Object Registry (or doregistry) is used to define collections of digital objects that exist in multiple repositories to support browsing and searching.
A more detailed overview on this architecture is available on the International DOI Foundation web-site and in a paper by Robert Kahn & Robert Wilensky that describes in detail the conceptual and technical design.
When you start paying attention to DOI’s you will see them everywhere. The one attached to this figure, reproduced from an important paper by Carol Tenopir on scientific practices and perceptions regarding data-sharing, illustrates how the character string is constructed. This DOI could be read as “doi: address of directory where object is logged/journal. PLoS One. ISBN number. the specific part of the journal where the object is located” Pasting the DOI into the web-service dx.doi.org or prefixing a DOI with this url e.g http://dx.doi.org/10.1371/journal.pone.0021101.g001 will take you to the document. Alternatively you pasting a DOI into most web-browsers will take you to the document via a search engine.
Increasingly, scientists publishing data via portals such as GBIF and BioFresh are attaching a DOI to their data sets. Organizations such as DataCite, formed to support data publishing, sharing and archiving provide services to ‘mint’ DOIs for data. However, many larger universities and research institutes are establishing digital research & data archives and the ability to ‘mint’ dois for their scientists is an integral part of such initiatives.
We hope this explanation captures the essence of Digital Object Identifiers and we would welcome comments to help clarify or expand upon key points.
Paul Jepson & Aaike De Wever
Kahn, R & R. Wilensky (2008) A framework for distributed digital object services. International Journal on Digital Libraries (2006) 6: 115–123 DOI 10.1007/s00799-005-0128-x
Tenopir, C. et al (2011) Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6): e21101. doi:10.1371/journal.pone.0021101