Novel computational tool helps scientists detect hidden cell types behind disease

Cells throughout the body work together like singers in a choir to keep us healthy, as long as they work in perfect harmony. If any cells are off key, that harmony can be disrupted, with far-reaching effects across the body. By pinpointing the discordant cells, scientists may be able to learn how to get them back in tune and restore health.

Taking inspiration from that musical metaphor, a team of researchers at Gladstone Institutes has released a novel computational tool called CHOIR that can improve the detection of off-key cells. CHOIR, short for "cluster hierarchy optimization by iterative random forests," categorizes thousands or even millions of cells into separate, biologically distinct groups, helping home in on specific cell types or conditions that may underlie disease.

"What's exciting about CHOIR is that it solves some of the key limitations of existing tools," says Gladstone Investigator Ryan Corces, PhD, senior author of a new study published in Nature Genetics that introduces CHOIR. "It can more accurately identify rare cell types, while also avoiding the tendency of other tools to 'hallucinate' cell types that aren't actually biologically distinct from each other."

Using this new tool, we can pinpoint cells that promote health or disease and that may not have been revealed otherwise. This deep insight allows us to focus investigations and therapeutic interventions on the most promising targets."

Lennart Mucke, MD, director of the Gladstone Institute of Neurological Disease and co-author of the study

Getting to the biological truth

CHOIR arose from necessity. Cathrine Sant, PhD, now a postdoctoral scholar at Gladstone, began work on this project as a graduate student in Mucke's lab.

At the time, she was studying Alzheimer's disease and learning how to analyze data generated by single-cell sequencing technologies. Such methods capture the distinct biological identities or states of cells in any given tissue sample-revealing, for instance, which genes are turned on or off, or which proteins are present on cell surfaces.

Sant wanted to explore the different cell types or states potentially involved in Alzheimer's. To do so, she needed a statistical method to help her sort through her single-cell data by grouping the cells into biologically distinct clusters-just as singers in a choir might be grouped into sopranos, altos, or baritones.

She considered a variety of existing tools designed for projects like hers. But none seemed quite right.

"I was struck by the number of arbitrary decisions some of the tools require scientists to make, and by how these decisions can introduce personal bias or limit you to existing biological knowledge-reducing the potential for novel discovery," says Sant, who led the development of CHOIR and is first author of the new study. "It felt more like a choose-your-own-adventure than actually getting to the biological truth in the dataset."

So, Sant set out to find a better way of revealing that truth. She turned to Corces, who had just started his lab at Gladstone, to leverage his expertise in computational methods, while also tapping into Mucke's extensive knowledge of neurodegenerative diseases.

Together, the scientists developed a user-friendly method that relies on an unbiased statistical framework rather than intuition. The result is CHOIR, a freely available tool that can be applied across different tissue types from humans and experimental models to identify biologically meaningful groups of cells or cell conditions.

"Hundreds of people have downloaded CHOIR since we first made it available online in a preliminary format about a year ago," Sant says. "It has been gratifying to see the many creative ways in which scientists are already using the tool across diverse fields, including neuroscience and immunology, as well as cardiovascular and cancer research."

A tool with important guardrails

As a key component of its design, CHOIR incorporates a machine learning method that lets scientists use it for data produced by any single-cell analysis method, including those focused on RNA, DNA, or proteins.

CHOIR also has built-in guardrails to avoid pitfalls of other tools. For example, it safeguards against underclustering, in which biologically distinct cell types are mistakenly grouped together, and also prevents overclustering, which could send a researcher on a wild goose chase by identifying cell types as distinct when they're not.

Additionally, unlike other tools that assume different cell types occur in similarly sized clusters, CHOIR takes into account what actually happens in the body, which is made up of cell populations whose sizes range from abundant to rare.

"CHOIR excels at grouping common cell types into large, cohesive clusters while simultaneously pinpointing rare cell populations-the needles in a haystack," Sant says.

Together, these features allow CHOIR to reliably detect and discover cell types or states that might be important for diagnosing, treating and preventing disease.

CHOIR takes the stage

To confirm CHOIR's prowess, Sant and her colleagues tested it across a variety of single-cell data types-including combinations of multiple data types-and a variety of biological samples, including brain, blood, and cancer cells. When pitted against other tools for analyzing single-cell data, CHOIR outperformed 15 of the most popular ones, identifying distinct cell types that other tools missed.

"Regardless of the type of tissue we tested, CHOIR performed better than other methods, even without any tweaks to its default settings," Corces says. "Being able to rely on those defaults avoids potential biases that can be introduced when researchers are required to tweak settings based on their personal intuition. That's really important for standardization and ensuring research findings are rigorous and reproducible across labs."

Now, equipped with CHOIR, Sant is taking a fresh approach to Alzheimer's research. She and her colleagues are using it to zoom in on specific types of brain cells after reducing levels of the protein tau-a strategy being explored as a potential treatment for the disease. They're also using CHOIR to analyze an Alzheimer's dataset involving single-cell data from millions of cells from human tissue samples.

Meanwhile, other labs at Gladstone are already applying CHOIR to study the brain, the heart, and the immune system. "Many researchers are using single-cell data these days and CHOIR is applicable across many studies," Mucke says. "We hope this powerful new research tool will advance diverse areas of science and biomedicine."

Source:
Journal reference:

Sant, C., et al. (2025). CHOIR improves significance-based detection of cell types and states from single-cell data. Nature Genetics. doi.org/10.1038/s41588-025-02148-8.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Diagnostic colonoscopy follow-up unaffected by COVID-19 surge, despite screening delays