Calling All Researchers to Fight COVID-19 with Open Science
Lifebit
In light of recent developments surrounding the novel coronavirus (COVID-19) global outbreak, researchers around the world are generating clinical, epidemiological and sequencing data at an unprecedented pace.
The scientific community is mobilising to help push research forward at a critical moment in history, and is doing so by embracing an open data sharing approach.
At Lifebit, we are closely involved in the scientific community and the almost daily evolving COVID-19 developments. As such, we are accessing a growing number of valuable resources for researchers and organisations, in the areas of epidemiology, vaccine development, diagnosis, viral sequencing and genomics, and everything in between.
Here, we are publishing the first instalment of our curated COVID-19 resources to keep our readers, clients, and partners up-to-date on the latest tools and developments from the frontlines, so that together we can all contribute our time and knowledge to combat this global crisis.
Data Resources
Global Initiative on Sharing All Influenza Data (GISAID)
The hCoV-19 genome is a RNA molecule of approximately 30,000 bases containing 15 genes. For comparison, the human genome is 3,000,000,000 bases in size, and contains around 30,000 genes. The genome sequences of hCoV-19 are crucial to design and evaluate diagnostic tests, to track and trace the ongoing outbreak and to identify potential intervention options.
The GISAID Initiative promotes the international sharing of hCov-19 virus sequences, related clinical, epidemiological and geographical data, to help researchers understand how the virus is evolving and spreading.
Interested researchers can simply signup by providing basic information, and once that information has been reviewed and approved by GISAID, access credentials will be sent.
Nextstrain: Real-time tracking of pathogen evolution
Nextstrain is an online resource that uses genome data to monitor the evolution of disease-causing organisms such as viruses in real time, and has already tracked several past outbreaks (e.g. Zika, Ebola and Dengue).
The resource already contains over 700 hCoV-19 genomes (enabled by GISAID data), which can be used to trace the outbreak by detecting new mutations in the virus. Mutations do not necessarily affect how the virus behaves, but they serve as genetic signatures to link related cases, even across geographies (e.g. a virus sequenced in London could have mutations that suggests that it originated from an Italian outbreak). NextStrain publishes a weekly situation report that analyses these important trends.
COVID-19 Research Communities & Challenges
COVID-19 Open Research Dataset Literature Corpus Challenge (CORD-19)
The COVID-19 Open Research Dataset (CORD-19) is a free resource of over 44,000 scholarly articles about COVID-19 and the coronavirus family of viruses for use by the global research community. The dataset is updated on a weekly basis as new research is published in peer-reviewed publications and archival services (e.g. bioRxiv, medRxiv).
CORD-19 is intended to mobilise researchers to apply recent advances in natural language processing to generate new insights in support of the fight against COVID-19.
A list of key questions can be found under the Tasks section of the CORD-19 dataset, which have been drawn from the NASEM’s SCIED (National Academies of Sciences, Engineering, and Medicine’s Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats) research topics and the World Health Organisation’s R&D Blueprint for COVID-19.
Kaggle is now sponsoring a $1,000 per task award to winners, who may elect to receive awards as a charitable donation to COVID-19 relief efforts or as a monetary payment.
[COVID-19-BH20] COVID-19 Biohackathon
All researchers are called upon to participate in the COVID-19-BH20 Biohackathon to work on tooling for COVID-19 analysis from April 5th – 11th, 2020. The ultimate goal of this Biohackathon is to generate more readily accessible data, protocols and detection kits.
COVID-19-BH20 revolves around a set number of topics, including: FAIR data, ontology, pan-genome, workflows, machine learning, knowledge graphs, tracking, among many others.
Let’s keep the conversation going – please share with us any COVID-19 resources that you have found valuable. We will continue to update our readers and followers regularly. We are in this together!