SCimilarity revolutionizes single-cell data analysis with rapid cross-tissue comparisons

Download PDF Copy

By Vijay Kumar MalesuReviewed by Susha Cheriyedath, M.Sc.Nov 21 2024

Unlocking the secrets of cellular similarity: how SCimilarity transforms single-cell data into insights on disease, development, and tissue biology.

SCimilarity search engine. Image Credit: SCimilarity

In a recent study published in the journal Nature, researchers in Canada and the United States developed Single-Cell Similarity (SCimilarity), a framework for rapid, interpretable searches of single-cell or single-nucleus Ribonucleic Acid -seq (sc/snRNA-seq) data. This framework enables the discovery of similar cell states across the Human Cell Atlas.

Background

Over 100 million cells have been profiled using sc/snRNA-seq across various conditions, providing unprecedented opportunities to link cell states across development, tissues, and diseases. However, large-scale analyses remain limited due to challenges in dataset harmonization, defining shared representations, and lack of robust similarity metrics or scalable search methods.

Current approaches often fail to generalize across datasets and cannot efficiently query massive atlases for similar cell profiles. Further research is needed to develop foundational models that enable accurate, scalable, and interpretable searches, unlocking the full potential of single-cell atlases to advance biological discovery.

About the study

scRNA-seq has profiled millions of individual cells across various tissues, conditions, and diseases, offering transformative opportunities to link cellular states across contexts.

Effective comparisons between datasets, however, remain limited due to challenges in harmonizing diverse data, defining common representations, and developing accurate metrics to quantify cellular similarity.

While preserving dataset-specific information, existing models often fail to generalize or efficiently search large atlases for comparable cell states.

Metric learning, a technique successfully applied in fields like image processing, offers a promising solution. By embedding cell profiles into a shared low-dimensional space, it becomes possible to identify biologically similar cells across vast datasets. Such representations could enable scalable, interpretable searches for cells in diverse contexts, facilitating cross-dataset comparisons and biological discovery

Study results

SCimilarity demonstrated generalization across diverse single-cell profiling platforms. Although trained primarily on 10x Genomics Chromium data, it effectively embedded and annotated cell profiles from multiple platforms, including scRNA-seq and snRNA-seq datasets.

For example, human peripheral blood mononuclear cells (PBMC) samples profiled across seven platforms exhibited consistent cross-platform annotation precision, except for rare cell types like conventional dendritic cells (cDCs) and plasmacytoid dendritic cells (pDCs).

While minor differences in embedding distances were observed, particularly for non-10x platforms such as Switching Mechanism At 5' End of RNA Template sequencing (SMART-Seq2), SCimilarity maintained high performance, showcasing its adaptability to diverse data sources.

A key advantage of SCimilarity is its ability to integrate datasets without explicit batch correction. By quantifying representation confidence for individual cells, the model identifies outliers and assesses its generalization to new data. For example, low-confidence annotations were associated with poorly represented tissues in training data, such as the stomach and bladder. This capability enabled the construction of an atlas spanning 30 human tissues and facilitated pan-tissue comparisons.

The model also excelled in annotating cell types through its embedding-based similarity measure. SCimilarity annotated individual cells independently, circumventing the need for clustering and retrieving the most similar cells efficiently. It achieved competitive accuracy with existing methods like single-cell ANnotation using Variational Inference (scANVI) and CellTypist, even matching fine-grained annotations supported by protein markers. For example, SCimilarity annotated 86.5% of cells in healthy kidney samples correctly when compared to author-provided labels, performing on par with tissue-specific models.

SCimilarity’s interpretability was validated using Integrated Gradients, which identified critical gene contributions to cell type annotations. These gene attributions aligned well with known markers for major cell types, such as surfactant genes distinguishing lung alveolar type 2 (AT2) cells. This demonstrates SCimilarity's capacity to capture biologically meaningful features without prior knowledge of cell type-specific signatures.

The model’s query capabilities were tested using fibrosis-associated macrophages (FMΦs) and myofibroblasts in interstitial lung disease (ILD). SCimilarity identified FMΦ-like cells across ILD datasets, cancers, and other fibrotic diseases, revealing shared cellular states. Notably, it uncovered FMΦs in rare contexts, such as pancreatic ductal adenocarcinoma (PDAC), suggesting their broader relevance in fibrosis.

To further explore its utility, SCimilarity searched for FMΦ-like cells in vitro. Surprisingly, it identified cells cultured in a 3D hydrogel system as transcriptionally similar to FMΦs. Experimental validation confirmed SCimilarity’s prediction, demonstrating its potential to identify novel experimental conditions and model disease-relevant cell states in vitro.

Conclusions

To summarize, SCimilarity advances single-cell analysis by enabling scalable and efficient searches across diverse scRNA-seq and snRNA-seq datasets.

Built on metric learning, it provides annotation and querying of cell profiles, leveraging full expression profiles to reduce biases from curated gene signatures. SCimilarity excels in identifying transcriptionally similar cells, facilitating discoveries of novel states like FMΦs and myofibroblasts across diseases.

Its ability to generalize to unseen datasets and its open-source availability make it a foundational tool for exploring the Human Cell Atlas, supporting diverse biological investigations, and uncovering insights into human biology and disease mechanisms.

Source:

SCimilarity - https://github.com/Genentech/scimilarity

Journal reference:

Heimberg, G., Kuo, T., DePianto, D.J. et al. A cell atlas foundation model for scalable search of similar human cells. Nature (2024), DOI - 10.1038/s41586-024-08411-y, https://www.nature.com/articles/s41586-024-08411-y

Posted in: Molecular & Structural Biology | Device / Technology News | Medical Science News | Medical Research News

Comments (0)

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Kumar Malesu, Vijay. (2024, November 21). SCimilarity revolutionizes single-cell data analysis with rapid cross-tissue comparisons. News-Medical. Retrieved on February 07, 2026 from https://www.news-medical.net/news/20241121/SCimilarity-revolutionizes-single-cell-data-analysis-with-rapid-cross-tissue-comparisons.aspx.
MLA
Kumar Malesu, Vijay. "SCimilarity revolutionizes single-cell data analysis with rapid cross-tissue comparisons". News-Medical. 07 February 2026. <https://www.news-medical.net/news/20241121/SCimilarity-revolutionizes-single-cell-data-analysis-with-rapid-cross-tissue-comparisons.aspx>.
Chicago
Kumar Malesu, Vijay. "SCimilarity revolutionizes single-cell data analysis with rapid cross-tissue comparisons". News-Medical. https://www.news-medical.net/news/20241121/SCimilarity-revolutionizes-single-cell-data-analysis-with-rapid-cross-tissue-comparisons.aspx. (accessed February 07, 2026).
Harvard
Kumar Malesu, Vijay. 2024. SCimilarity revolutionizes single-cell data analysis with rapid cross-tissue comparisons. News-Medical, viewed 07 February 2026, https://www.news-medical.net/news/20241121/SCimilarity-revolutionizes-single-cell-data-analysis-with-rapid-cross-tissue-comparisons.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.