Novel bioinformatics pipeline for fast and scalable analysis of large viral phylogenies

Download PDF Copy

Revised

By Susha Cheriyedath, M.Sc.Reviewed by Danielle Ellis, B.Sc.Dec 15 2021

A team of researchers recently developed a bioinformatics approach to analyze viral phylogenetic clusters and posted their findings to the bioRxiv* preprint server.

Study: ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies. Image Credit: M. PATTHAWEE/Shutterstock Study: ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies. Image Credit: M. PATTHAWEE/Shutterstock

Background

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

Coronavirus disease 2019 (COVID-19) has become a global public health concern, and the emergence of several new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants is alarming. The variants reported so far have been categorized as either variants of interest (VOIs) or variants of concern (VOCs). The VOCs present increased health risks due to their higher transmissibility, immune-escape properties, and lower response to existing vaccines. So far, five VOCs have been detected – Alpha (B.1.1.7), Beta (B.1.351), Gamma (P1), Delta (B.1.617.2), and Omicron (B.1.1.529).

Currently, there is a growing exigency among healthcare agencies and scientists to address the rising health concerns, pressing them to develop methods for early detection and in-depth analysis of emerging variants that could potentially alert us to build and adopt better COVID-19 management policies.

About the study

In the present study, researchers developed a novel bioinformatics approach named ClusTrace, for fast and scalable analysis of sequence clusters or clades in large viral phylogenies. ClusTrace can perform several high-level functions such as outlier filtering, aligning, phylogenetic tree reconstruction, cluster or clade extraction, variant calling, visualization, and reporting.

It was developed to trace COVID-19 transmission, emphasizing fast and unsupervised screening of phylogenies for markers of super-spreading events, high rates of cluster growth, and the accumulation of novel mutations. ClusTrace can complement existing toolkits like Nextstrain, Pangolin, Nextclade, and Lazypipe for unsupervised clade/cluster analysis with intuitive visualizations and reporting. The team analyzed the SARS-CoV-2 genomic sequence data from COVID-19 patients in Finland between January 2021 and May 2021. The SARS-CoV-2 Alpha and Beta variants were dominant with 5,379 and 1,051 sequences, respectively, in this dataset.

Findings

The researchers found that the SARS-CoV-2 Alpha variant had many high-frequency amino acid mutations that followed the GISAID reference. In contrast, only five amino acid mutations were specific to the Finnish data with 10% or higher frequency. As many as half of the mutations for the Beta variant with a frequency of 10% or higher were not covered by the GISAID reference. The team also reported non-GISAID mutations, but only the Beta variant showed non-GISAID mutations in the Spike protein, likely with the potential to affect receptor binding.

Cluster analysis yielded 110 clusters for the Alpha variant and 19 clusters for the Beta variant. Of these clusters, researchers analyzed 10 clusters each for the two variants that had the highest growth rate peaks per month in the study period. Around 58.5% of all Alpha sequences covered clusters with the largest per month growth rate peaks.

For the Beta variant, 94.5% of sequences covered the ten largest clusters. The non-GISAID mutations in these clusters ranged from one to six for the Alpha variant and three to eight for the Beta variant. The number of sequences added to the cluster referred to as the maximal absolute growth rate for the Alpha variant was between 74 and 310 per month in February and March, while it was between 11 and 148 for the Beta variant with peak growth observed during February, March, and April. The cluster size ranged from 100 to 479 and 14 to 259 for Alpha and Beta variants.

Conclusions

The team demonstrated the use of ClusTrace for lineage assignment, the generation of multi-fasta collections, outlier filtering, alignment, and phylogenetic tree construction. They reported that ClusTrace could perform automated clustering coupled with cluster growth rate analysis and variant calling to scan through phylogeny, which could be interpreted as unsupervised phylogeny-based cluster analysis. It was shown that clusters with high growth rates and non-reference mutations in genomic regions could be easily highlighted for further downstream analysis. ClusTrace could provide different visualizations like Excel summaries and g3viz plots for growth-rate or mutation-rate clades.

In conclusion, ClusTrace could act as a bridge between the massive inflow of sequence data and the proper organization of these sequences (into lineages, alignments, etc.) to understand the evolutionary nature of the pandemic better. SARS-CoV-2 is likely to mutate and evolve into new variants in the future. The global response also requires timely interventions with newer and advanced strategies to deal with the pandemic. The increased capacity of genome sequencing across the globe could be further bolstered by developing novel bioinformatics tools for efficient and scalable genomic surveillance of viruses.

Journal references:

Preliminary scientific report.
Plyusnin, I. et al. (2021) "ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies". bioRxiv. doi: 10.1101/2021.12.09.471941. https://www.biorxiv.org/content/10.1101/2021.12.09.471941v1
Peer reviewed and published scientific report. Plyusnin, Ilya, Phuoc Thien Truong Nguyen, Tarja Sironen, Olli Vapalahti, Teemu Smura, and Ravi Kant. 2022. “ClusTRace, a Bioinformatic Pipeline for Analyzing Clusters in Virus Phylogenies.” BMC Bioinformatics 23 (1). https://doi.org/10.1186/s12859-022-04709-8. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04709-8.

Article Revisions

May 9 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.

Posted in: Medical Science News | Medical Research News | Disease/Infection News

Comments (0)

Written by

Susha Cheriyedath

Susha is a scientific communication professional holding a Master's degree in Biochemistry, with expertise in Microbiology, Physiology, Biotechnology, and Nutrition. After a two-year tenure as a lecturer from 2000 to 2002, where she mentored undergraduates studying Biochemistry, she transitioned into editorial roles within scientific publishing. She has accumulated nearly two decades of experience in medical communication, assuming diverse roles in research, writing, editing, and editorial management.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Cheriyedath, Susha. (2023, May 09). Novel bioinformatics pipeline for fast and scalable analysis of large viral phylogenies. News-Medical. Retrieved on February 10, 2026 from https://www.news-medical.net/news/20211215/Novel-bioinformatics-pipeline-for-fast-and-scalable-analysis-of-large-viral-phylogenies.aspx.
MLA
Cheriyedath, Susha. "Novel bioinformatics pipeline for fast and scalable analysis of large viral phylogenies". News-Medical. 10 February 2026. <https://www.news-medical.net/news/20211215/Novel-bioinformatics-pipeline-for-fast-and-scalable-analysis-of-large-viral-phylogenies.aspx>.
Chicago
Cheriyedath, Susha. "Novel bioinformatics pipeline for fast and scalable analysis of large viral phylogenies". News-Medical. https://www.news-medical.net/news/20211215/Novel-bioinformatics-pipeline-for-fast-and-scalable-analysis-of-large-viral-phylogenies.aspx. (accessed February 10, 2026).
Harvard
Cheriyedath, Susha. 2023. Novel bioinformatics pipeline for fast and scalable analysis of large viral phylogenies. News-Medical, viewed 10 February 2026, https://www.news-medical.net/news/20211215/Novel-bioinformatics-pipeline-for-fast-and-scalable-analysis-of-large-viral-phylogenies.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.