Skip to main content


CGPS is pleased to be a partner in the UK consortium on COVID-19 genomics (COG), that aims to link local sequencing centres with large scale facilities from across the UK to apply real-time genomic epidemiology to our understanding and ability to respond to the pandemic.

Pathogen surveillance is a crucial part of early-stage and ongoing responses to COVID-19. The increased use of genomic methods and technology platforms have enabled real-time research and collaboration; have informed country policy commitments and action plans, and will create a strong framework for the use of genomic methods for future real time epidemiology.

Main Article

Proactive, real-time genomic surveillance represents an effective and proven method of analysing outbreaks in real time [i]; enabling surveillance systems to achieve levels of speed and accuracy not plausible a decade ago [ii].

Genomic methods have been a critical aspect of the early-stage and ongoing global scientific response to SARS-CoV-2, with the early release of the genome sequence [iii] providing crucial early understandings about source, host, transmission, and properties [iv], establishing similarities and difference to other coronaviruses, helping to design diagnostic and potential treatment approaches [v].

The sharing of proactive information amongst scientists is enabled by new global tools and platforms which have been created to house and share data about virulent pathogens [vi]. These enable real-time information to be made widely available, including for use by public health agencies [vii].

Some countries are now creating new collaborations of existing genomic capacity within their countries to track the spread of the virus and identify emergent strains [viii].

A UK research and collaboration ecosystem, supported by country policy commitments and action plans, and regional and international capacity and guidance, has created a strong framework for the use of genomic methods for pathogen surveillance during the COVID-19 pandemic.

The UK Government and the UK’s Chief Scientific Adviser have backed the UK’s leading clinicians and scientists to map how COVID-19 spreads and evolves using whole-genome sequencing. Through a £20 million investment, the consortium will look for breakthroughs that help the UK respond to this and future pandemics, and save lives.

CGPS is pleased to be a partner in this Consortium (

) delivering large-scale and rapid whole-genome virus sequencing to inform UK response.

Data Analysis, phylogenetics and delivery of insight for decision making are a key component and deliverable of the project. The CGPS team bring expertise in data handling, processing, linking and visualisation to enhance the delivery of data and interpretation using tools such as Microreact, EpiCollect and Data-flo.

An important aspect of the approach is the establishment of a network of sequencing centres. Samples from confirmed cases have been sent to a network of sequencing centres around the country including Cardiff, Edinburgh, Belfast, Birmingham, Exeter, Glasgow, Liverpool, London, Norwich, Nottingham, Oxford and Sheffield with The Wellcome Trust Sanger Institute providing large-scale capacity and additional support.

The utility of Microreact and Data-flo in public health agencies

CGPS has worked with colleagues in Public Health Wales and Public Health Scotland to install local instances of Data-flo and Microreact. Post-installation support in the utility of these software applications has assisted implementation of these tools in order to streamline local data processing of COVID-19 associated metadata and visualisation of genomics data alongside descriptive data within multidisciplinary teams.

Although the public versions of Microreact and Data-flo allow anybody to use these tools for free, governance rules mean that data concerning confidential personal or commercial data (e.g patient records or care home names) is not permitted to be uploaded to these sites. However local installations allow

· Sensitive data to be kept within private networks such as NHSnet or public health agency computing infrastructure

· Authorisation of access to these data controlled via local authentication services such as Active Directory or LDAP

Through local use of these applications, data collection from multiple sources followed by manipulation and aggregation has been possible, allowing automated creation of COVID metadata before upload to the consortium servers on CLIMB. The ability to combine private metadata with phylogenies derived from SARS-CoV-2 genome sequences within a local installation of Microreact has allowed public health officials to communicate highly visual messages to personnel throughout the public health hierarchy. This has allowed a level of detail and geographic resolution that would not have been possible with the public instances due to the valid privacy concerns.

Other contributions to the COG-UK consortium

The CGPS team has also contributed to the UK SARS-CoV-2 sequencing effort by building two completely new web applications

Metadata Uploader

As part of the weekly collection of genomes and associated metadata a login-protected website allows sequencing sites without staff that have programmatic skills to upload the metadata easily. Working with CLIMB staff at the University of Birmingham, the website synchronises with the upload API hosted at CLIMB to ensure that the data is valid.

The website documentation describes the fields that are collected [ix].


Colleagues in the COG-UK consortium developed Pangolin (Phylogenetic Assignment of Named Global Outbreak LINeages), a software used to assign lineages to SARS-CoV-2 sequences. For those familiar with the UNIX command line the installation of this software is straightforward, however for those who are unfamiliar with the command line or without access to a UNIX computer, the CGPS developed an open source web based application allowing users to:

· Assign lineages to genome sequences of SARS-CoV-2

· View descriptive characteristics of the assigned lineage(s)

· View the placement of the lineage in a phylogeny of global samples

· View the temporal and geographic distribution of the assigned lineage(s)

A full description of the application is found in an article on [x]

Pangolin front page

Pangolin results


[i] Happi, A., Ugwu, C., Happi, C., 2019. Preparing for the next Ebola outbreak: in-country genomic capacity in Africa The Lancet April 15, 2019

[ii] Dolloy 2018. Big Data’s Role in Precision Public Health. Front. Public Health, 07 March 2018

[iii] European Centre for Disease Control 2020. Event Background COVID-19. Sourced 29 March 2020 at

[iv] Lu, R., Zhao, X., Li, J., Niu, P., Yang, B., Wu, H., Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The Lancet Vol 395 February 22, 2020

[v] Fuk-Woo Chan, J., Kok, K., Zhu, Z, Chu, H., To, K., Yuan, S., and Yuen, K., 2020. Genomic characterization of the 2019 novel human pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerging Microbes & Infections, 9:1

[vi] See Microreact ; Pathogenwatch ; and NextStrain

[vii] European Centre for Disease Control 2020. Event Background COVID-19. Sourced 29 March 2020 at

[viii] UK COVID Genomics Consortium UK 2020. Public data and analysis. Sourced 28 April 2020 at

[ix] Metadata Uploader Documentation

[x] Pangolin web application release article