In a recent article posted to the journal Nature Biotechnology*, researchers presented the Multiscale potential of heat-diffusion for the affinity-based trajectory embedding (PHATE) approach that identified multimodal signatures of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
A range of methods that assess dozens of parameters in millions of cells obtained from large patient populations provides high-throughput biomedical data. However, current data exploration and dimensionality reduction tools like principal component analysis (PCA) only reveal a single level of data granularity.
A recent study on SARS-CoV-2 used the existing data characterization method to determine cellular responses at a single resolution. This study failed to establish variation among ineffective and effective immune responses. Together, more advanced computational techniques to extract biological insights are needed as the biomedical field creates more complex and high-dimensional datasets.
About the study
In the present study, the researchers present a technique for learning abstracted biological traits that are directly predictive of illness outcome by sweeping over all levels of data granularity. The method named Multiscale PHATE was built on a coarse-graining method known as diffusion condensation.
The team applied Multiscale PHATE to a dataset of 54 million cells from 168 individuals hospitalized with SARS-CoV-2 infection. These patients were admitted to the Yale New Haven Hospital, United States (US). Further, the generalizability of Multiscale PHATE across various data types like clinical variables, single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq), single-cell ribonucleic acid sequencing (scRNA-seq), and flow cytometry were analyzed.
Findings
The results show that Multiscale PHATE overcomes two drawbacks of the original diffusion condensation using a denoising strategy over the diffusion potential. Diffusion condensation in its original form was ineffective in visualizing or understanding the nonlinear geometry of biological datasets and was susceptible to condensing points off the data manifold. Multiscale PHATE provided information regarding the nonlinear geometry of complex datasets and a method for rapid visualization and understanding about clusters at any resolution.
In the first ablation study, Multiscale PHATE outperformed all other visualization methods, namely t-distributed stochastic neighborhood embedding (t-SNE) and uniform manifold approximation and projection (UMAP) in nearly all ranges of dropout and variation biological noises. Although Homology-UMAP provided good visualizations, their denoised manifold affinity preservation (DeMAP) scores were lower than Multiscale PHATE.
Similarly, in the second ablation study, PHATE turned out to be the most successful visualization strategy when embedding multiscale clusters formed by the coarse-graining method. In addition, Multiscale PHATE performed better than other clustering modalities such as single-linkage hierarchical clustering, Leiden, and Louvain on datasets with contrasting types and degrees of noise.
Multiscale PHATE identified subsets of peripheral blood mononuclear cells (PBMCs) linked with coronavirus disease 2019 (COVID-19)-associated survival and mortality using 251 blood samples from hospitalized SARS-CoV-2-infected patients. It revealed that the levels of monocytes (CD14+), B cells (CD19+), and granulocytes (CD16+SSChi) were high in deceased COVID-19 patients. Further, it suggests T cells (CD3+) enhanced the likelihood of survival in SARS-CoV-2 infection.
In detail, Multiscale PHATE identified that the CD14−CD16hi and CD14+CD16int monocytes were elevated during severe COVID-19, and CD16 positively correlated with SARS-CoV-2-related death. In contrast, the human leukocyte antigen–DR isotype (HLA-DR) and CD14 correlated with survival. A unique monocyte population associated with mortality was identified by PHATE, named CD14−CD16hiHLA-DRlo. PHATE indicates although CD66b and CD14 neutrophils correlated negatively with COVID-19-related mortality, elevated side scatter (SSC) and forward scatter (FSC) in neutrophils were linked to SARS-CoV-2-associated death. This inference suggests CD16hiCD66blo neutrophils were elevated in people who died of SARS-CoV-2 infection.
While PHATE revealed that the antibody-secreting B cells called plasmablasts or CD86loHLADR−/CXCR3+ were elevated in people suffering adverse COVID-19 outcomes, the late-activated mature B cells named CD86 were associated with survival in SARS-CoV-2. The CD4+ interferon (IFN)-γ+ granzyme B+ Th17 cells were enriched in individuals who died of COVID-19. CD8+ T cell subset, T effector memory re-expressing CD45RA (TEMRA) cells were elevated in those with severe SARS-CoV-2 infection. Additionally, the activation state markers like CD45RA and HLA-DR over all CD8+ T cells were associated with COVID-19-related death in conditional density resampled estimate of mutual information (DREMI) analysis.
The DREMI analysis indicated that younger individuals and females were associated with an increased likelihood of mounting a robust T cell response in SARS-CoV-2 infection. Multiscale PHATE exhibited a prediction accuracy of 83.7±0.6% through five-time cross-validation, an accuracy of 74.2±0.8% for death cases, and 85.5±0.7% for survival cases. The myeloid-focused flow-cytometry panel of Multiscale PHATE identified that T cells, CD16hi neutrophils, and monocytes were three of the top-most predictors of eventual disease outcome in COVID-19. However, PHATE-derived outcomes exhibited 64.7±1.1% and 73.8±0.8% lower accuracy compared to the Louvain-computed and flow cytometry-gated populations results, respectively.
DREMI and conditional-density rescaled visualization (DREVI) analysis between PHATE-derived likelihood score for COVID-19 outcomes and clinical features indicated that systemic inflammatory markers, organ dysfunction, and markers of physiologic instability were linked to high risks of COVID-19-related mortality. Further, DREMI analysis suggested that the prolonged COVID-19 recovery period strongly correlated with kidney dysfunction and age.
Conclusion
The study findings offer a multiscale biological data exploration strategy to visualize, group, and analyze large-scale datasets. The Multiscale PHATE approach addresses a crucial gap in biological data exploration as it discovered clustering of data at different scales that predicted the clinical outcomes. In contrast to the existing clustering or dimensionality approaches, Multiscale PHATE provides a rapid manifold learning-based strategy for unveiling a continuum of structure and feature resolutions by comprehending data topology. Moreover, Multiscale PHATE can be used in conjunction with DREMI and manifold enhancement of latent dimensions (MELD) to derive detailed understandings of biological processes.
While T cells were associated with a protective effect, Multiscale PHATE combined with DREMI and MELD analysis identified a pathogenic CD4+ IFN-γ+ granzyme B+ Th17 cell subpopulation associated with negative outcomes in SARS-CoV-2. Further, Multiscale PHATE's scalable technique becomes more crucial as datasets grow in size and the number of samples increases.