An automated bioinformatics pipeline to monitor HIV data in real-time

Download PDF Copy

By Neha MathurReviewed by Benedette Cuffari, M.Sc.Mar 15 2023

In a recent study published in Viruses, researchers discuss an open-source and automated bioinformatics pipeline to prospectively and routinely analyze and integrate heterogeneous human immunodeficiency virus (HIV)-1 sequence data. This approach was applied on 18 monthly datasets generated between January 2020 and June 2022 in Rhode Island (RI) in the United States.

The proposed pipeline facilitated routine collaboration between researchers and the RI Department of Health (RIDOH) in near real-time. This approach also allowed researchers to compare the effect of distinct phylogenetic methods and distance-only algorithms with datasets of HIV-1 sequences cluster analyses.

Study: An Automated Bioinformatics Pipeline Informing Near-Real-Time Public Health Responses to New HIV Diagnoses in a Statewide HIV Epidemic. Image Credit: CI Photos / Shutterstock.com

Background

Challenges associated with real-time data integration, analysis, and interpretation delay public health responses, particularly when HIV is considered. Thus, analyzing genomic HIV data or HIV-1 sequences could inform public health responses and ultimately overcome data management, computational, and analytical challenges.

Public health agencies routinely collect HIV-1 sequences during clinical care for drug resistance testing. The same samples could also help estimate viral evolution across individuals.

Just as contact tracing establishes social networks and serves as a proxy for the actual HIV transmission network, phylogenetic relationships among sequences could provide relevant information to guide public health responses. In fact, contact tracing is an independent source of information about social networks, which, in turn, could help detect undiagnosed or diagnosed out-of-care HIV cases.

About the study

In the present study, researchers source and integrate statewide molecular HIV data from clinical, sequence, and public health databases.

SQUAT principles were subsequently used to analyze this data and identify sequences with more than 5% stop codons, guanosine-to-adenosine hypermutation, atypical mutations, and exact edit nucleotides pairwise distance among new sequences. These sequences were then compared with historical molecular HIV-1 sequences.

Following quality analyses, the pipeline was used to detect molecular clusters in sequences recently added from new index cases. To this end, the pipeline used MAFFT v. 7.313 to perform sequence alignments of the initial single HIV-1 sequence multiple times for each patient.

The pipeline implemented five phylogenetic methods and cluster-defining parameters that favored false positive clusters and maximized available information. Likewise, the novel approach used HIV-TRACE v. 0.4.4 to perform distance-only sequence clustering.

At a 1.5% distance threshold, HIV-TRACE detected a similar number of clusters as the phylogenetic methods. Furthermore, this pipeline compared clustering between RI’s statewide dataset with a subset obtained from a single large clinic in RI to evaluate the effect of an augmented sampling density.

After data integration, each pipeline component automatically generated reports. While individual-level reports summarized clustering, demographics, and clinical information of newly added sequences, a population-level report provided statewide clustering summaries. This data identified cluster growth over time, thereby depicting cluster membership of new and previous index cases.

Results

The pipeline developed in the current study incorporated four new features unavailable in prior HIV cluster analysis automated approaches. First, it had a flagging step that explored sequence quality. Second, it implemented several phylogenetic and distance-only clustering methods.

The novel approach also detected clustered individuals using a combination of the five phylogenetic methods. Finally, this pipeline summarized clustering results using visual representations.

While cluster analyses employing distance-only methods also identified large viral transmission networks, this pipeline helped public health officials manage HIV cases in real time. In addition, the pipeline seamlessly removed obstacles to phylogenetic analysis while facilitating replicability.

As compared to distance-only methods, the proposed pipeline detected 76% more clustered HIV cases. More specifically, it identified 37 new HIV cases for case management discussions.

The pipeline also helped researchers examine the differences in cluster identification between a clinic-based and statewide dataset, thus indicating the importance of good sampling. The authors noted that RI’s high statewide sequence sampling density was beneficial.

It is also imperative for careful interpretation and longitudinal accumulation of cluster data for more robust findings compensating for sequence addition-induced reduction of clusters.

Conclusions

The management of the ongoing HIV epidemic is a priority of the U.S. Department of Health and Human Services. The multi-disciplinary approach adopted in this study facilitated case management to disrupt HIV transmission in near-real-time in RI. Furthermore, the approach could allow prospective evaluation of the benefits of phylogenetic data and evidence-based discussions to guide public health intervention strategies.

Optimal integration of genomic and clinical data, including bioinformatics, analytical, and wet laboratory data from healthcare and public health organizations, could improve health outcomes. The authors released this pipeline for automated HIV cluster analysis as an open-source package that has been made available at https://github.com/kantorlab/hiv-real-time-phylogeny

Journal reference:

Howison, M., Gillani, F. S., Novitsky, V., et al. (2023). An Automated Bioinformatics Pipeline Informing Near-Real-Time Public Health Responses to New HIV Diagnoses in a Statewide HIV Epidemic. Viruses 15(737). doi:10.3390/v15030737

Posted in: Device / Technology News | Medical Science News | Medical Research News

Comments (0)

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Mathur, Neha. (2023, March 15). An automated bioinformatics pipeline to monitor HIV data in real-time. News-Medical. Retrieved on February 10, 2026 from https://www.news-medical.net/news/20230315/An-automated-bioinformatics-pipeline-to-monitor-HIV-data-in-real-time.aspx.
MLA
Mathur, Neha. "An automated bioinformatics pipeline to monitor HIV data in real-time". News-Medical. 10 February 2026. <https://www.news-medical.net/news/20230315/An-automated-bioinformatics-pipeline-to-monitor-HIV-data-in-real-time.aspx>.
Chicago
Mathur, Neha. "An automated bioinformatics pipeline to monitor HIV data in real-time". News-Medical. https://www.news-medical.net/news/20230315/An-automated-bioinformatics-pipeline-to-monitor-HIV-data-in-real-time.aspx. (accessed February 10, 2026).
Harvard
Mathur, Neha. 2023. An automated bioinformatics pipeline to monitor HIV data in real-time. News-Medical, viewed 10 February 2026, https://www.news-medical.net/news/20230315/An-automated-bioinformatics-pipeline-to-monitor-HIV-data-in-real-time.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.