Resources for Data Management and Sharing

Where should data be stored?

At present there is no major repository for clinical data, but Dryad, has declared its willingness to accept medical datasets. You can start the deposition process while submitting to BMJ Open. Supplementary or underlying data related to your paper are accepted. Dryad will provide you with a DOI for your dataset to aid citation and provide a permanent link to the data. (Note Dryad hosts data using a CC0 licence - check that this is suitable for the data that you are depositing.) The DataCite organisation has a growing list of repositories for research data.

Why share data?

Faster progress in improving health, better value for money and higher quality science were the three key benefits stressed by the UK's Medical Research Council, the Wellcome Trust, the US National Institutes of Health and others, who articulated their commitment to sharing data in a joint statement in 2011, and have developed policies and tools to assist their researchers to do so. The UK Data Archive's comprehensive 'Managing And Sharing Data' document states how sharing data can encourage enquiry and debate, promote innovation and collaboration, maximise transparency and accountability, improve research methods, reduce the cost of unnecessary research duplication, increase the impact of research and credit to the researcher, and provide education and training resources. As well as these 'public good' arguments, some researchers argue that there is also a citation advantage to be had from sharing data. FAIRsharing includes a catalogue of data sharing policies and standards (reporting requirements, terminologies and exchage formats).

How to share data

Firstly, ensure people know it exists and is available. This is one reason why all BMJ Open articles include a data sharing statement, to help publicise the existence of data sets. For data to be exploited to its maximum potential it is necessary for it not just to be accessible but intelligible and searchable. This is where standards for data preservation are required. Standards cover what should be included in the dataset, 'ontologies' or controlled vocabularies for annotating datasets, and exchange formats, for facilitating sharing. Researchers in other fields of science have been sharing data for years now, but standards for preserving and sharing medical data are still emerging. Pragmatic and technical guidance on how to go about preparing your data suitably is available from various sources. A few are listed below.

UK Data Archive

Managing and Sharing Data (2011) is 'designed to help researchers and data managers...produce highest quality research data with the greatest potential for long-term use'.

Digital Curation Centre

The DCC provides advice on how to store, manage and protect digital data. Their site includes tools and applications, MRC data plan FAQs, information on data management plans, a list of funders policies, legal information and a developing series of 'how-to' guides.

FAIRsharing

The site includes a growing catalogue of standards, databases and policies to help ensure that 'experiments are reported with enough information to be comprehensible and (in principle) reproducible, compared or integrated'.

Wellcome Trust

Provides Guidance for researchers: developing a data management and sharing plan

UK Medical Research Council (UKRI)

Provides tools and resources for researchers, including their Data and tissues toolkit and their Cohort dataset directory, plus a short glossary of common data-sharing terms.

National Cancer Research Institute (NCRI)

The NCRI Informatics Initiative 'supports the development of data standards and promotes a culture of data-sharing to facilitate storage and dissemination of research data'.

US National Institutes of Health

Resources include examples of data sharing plans alongside more general policy documents In 2010 the BMJ published this paper on preparing raw clinical data for publication, addressing specifically the issue of de-identifying datasets.

How is data cited?

There is no standard for citing data or data sets yet, but consensus is building around the use of persistent identifiers such as the DOI (digital object identifier) already familiar to journal publishing, along with more conventional bibliographic information (authors/creators of the data set, year of 'publication', title).