Gates Open Research

How can researchers share sensitive data openly?

Data and graphs on a laptop screen

Open data is central to open science practices, helping to improve reproducibility, transparency, and trust in research. However, it can feel challenging for some researchers to participate in open data, particularly when dealing with sensitive or commercial data. In our blog, we explore how researchers can share sensitive data openly while maintaining appropriate levels of security and privacy.  

What can sensitive data look like?  

Sensitive data can take many forms in different types of research.  

One of the main types of sensitive data is human data, which can include:  

  • Images, videos, audio files, or qualitative data related to attitudes, opinions, or experiences  
  • Clinical trial results  
  • Datasets from social media sites  
  • Personal identifying information, such as age, ethnicity, location, and sexuality  
  • Sensitive health status information, such as alcohol dependency  

These typically arise during health and social science research, but can also be found in others. Some of these data types could allow individuals to be identified, especially when used in combination.

As a result, it’s vital for these types of data to be handled appropriately, and only shared openly with sufficient safeguarding measures.  

Other types of sensitive data that can occur in all types of research include:  

  • Intellectual property, such as new inventions  
  • National security data, such as classified information from governmental bodies  
  • Third-party data, such as proprietary commercial information  

What measures can I take to share sensitive data openly?  

There are a number of safeguarding measures authors can take to ensure they can participate in open data practices safely and securely when sharing sensitive data.  

Identify ethical and legal requirements for data  

Before being able to share any data, you must understand all applicable legal and ethical requirements for your research.  

These differ depending on where you as an author are based, where your participants might be based, where your research is conducted, or where any third parties are based.  

For example, in Europe, GDPR legislation governs data protection, whilst the US has multiple different federal and state requirements, including the CCPA and Privacy Act. According to the UN, 137 countries around the world have some form of data protection legislation, so it’s important to identify relevant legislation for your work.  

In addition, ethical considerations must underpin any sharing of sensitive data, and you must ensure you consider the rights and dignity of individuals.  

Your institution, funder, or organization can usually provide guidance on ethical and legal considerations for both human and non-human sensitive data.  

Create a Data Management Plan  

A Data Management Plan, or Output Management Plan, is one of the best ways to ensure you share sensitive data safely.  

A DMP or OMP allows you to identify from the outset, before you even begin your research, the types of data you may collect, create, or reuse throughout the project.   

In turn, this can help you to identify what type of data might be sensitive, any measures you might need to put in place to ensure you can share data at the end of the project, and what data must not be shared at any point. 

Creating a Data Management Plan and using the information within it will help to inform how you carry out your research, and how you deal with data at each step of a project. This can help to:  

  • Identify any data sharing issues in advance  
  • Provide adequate time to implement alternative data sharing measures  
  • Allow you to obtain guidance from institutions, ethics boards, or funders before starting the time sensitive publication process.  

It’s important to think about both human and non-human data in a DMP, including any commercial or third-party data that might be reused or developed.  

Gain consents  

Whether you’re working with human participants or other third parties, it’s important to gain the appropriate consents.  

You need to clearly communicate what data you propose to share, how the data will be shared, the level of open access to be granted, and any limitations on this.  

All participants or third-party organizations must have the option to opt out of their data or IP being shared and, in the case of human data, the option to have data anonymized if they wish when sharing.  

These consents need to be recorded clearly in the DMP, ideally with evidence in writing, for the avoidance of doubt. It’s also important to note that once these consents are given, you can only share data in the exact way that has been consented.  

Anonymize human data  

When working with human data, one of the safest ways to share sensitive data openly while maintaining confidentiality is anonymization.  

This removes any identifying information from a dataset to reduce the likelihood of re-identification. However, this is not a replacement for consent and should only be done with data for which you have already received informed consent to share.  

Anonymization can be done for direct identifiers, such as full name, date of birth and address, and indirect identifiers such as ethnicity, gender, sexuality, or place of birth.  

Key data anonymization techniques include:  

  • Removing any variables that are not necessary for analysis or relevant to the research  
  • Making an information point less specific, such as swapping an address for a city  
  • Referring to a research participant without using their real information by using aliases  
  • Taking specific information like age and putting it into a banded range  
Examples of anonymization of quantitative and qualitative data

Control access to data  

In some cases, human data cannot be anonymized without losing its value, or you may be working with different sensitive data such as intellectual property or commercial data.  

In these cases, an alternative data sharing method is to use a controlled access data repository (again, only when consent is already granted).  

These allow researchers to store their data, but not publish it publicly. Instead, a metadata record such as a Data Availability Statement on Gates Open Research will be shared openly, which describes the data’s location and the conditions of access. 

The repository will then require users to meet certain requirements to access the data, thus ensuring that data is shared only in a way that is fully controlled by researchers.  

Publish data-related information  

Regardless of the precautions taken by the authors, there are some cases where data cannot be shared openly. 

As a result, most publishers and funders will have exceptions to open data policies, including Gates Open Research.  

In these cases, authors can publish some of their data-related information instead, such as:  

  • Methods sections that provide a detailed description of how the study and subsequent data were created.  
  • Metadata, such as Data Availability Statements, providing a description of the final data, discussion of any variables assessed, and a data sharing disclaimer.  
  • Any intermediary data that can be shared without concern.  
  • Detailed information about where third-party data was sourced and how users can source it themselves.  

This helps other researchers to reproduce the research for themselves, even if the original data is unable to be shared.  

Next steps  

If you’d like to find out more about sharing data, visit our data sharing guidelines.  

And if you’re ready to join the 3,000+ of Gates-funded researchers already publishing their work with Gates Open Research, submit your research for publication today.  


COMMENTS