Guidelines for using interpretable machine learning methods in computational biology

Machine learning is a powerful tool in computational biology, enabling the analysis of a wide range of biomedical data such as genomic sequences and biological imaging. But when researchers use machine learning in computational biology, understanding model behavior remains crucial for uncovering the underlying biological mechanisms in health and disease.

In a recent article in Nature Methods, researchers at Carnegie Mellon University's School of Computer Science propose guidelines that outline pitfalls and opportunities for using interpretable machine learning methods to tackle computational biology problems. The Perspectives article, "Applying Interpretable Machine Learning in Computational Biology -; Pitfalls, Recommendations and Opportunities for New Developments," is featured in the journal's August special issue on AI.

Interpretable machine learning has generated significant excitement as machine learning and artificial intelligence tools are being applied to increasingly important problems. As these models grow in complexity, there is great promise not only in developing highly predictive models but also in creating tools that help end users understand how and why these models make certain predictions. However, it is crucial to acknowledge that interpretable machine learning has yet to deliver turnkey solutions to this interpretability problem."

Ameet Talwalkar, associate professor in CMU's Machine Learning Department (MLD)

The paper is a collaboration between doctoral students Valerie Chen in MLD and Muyu (Wendy) Yang in the Ray and Stephanie Lane Computational Biology Department. Chen's earlier work critiquing the interpretable machine learning community's lack of grounding in downstream use cases inspired the article, and the idea was developed through discussions with Yang and Jian Ma, the Ray and Stephanie Lane Professor of Computational Biology. 

"Our collaboration began with a deep dive into computational biology papers to survey the application of interpretable machine learning methods," Yang said. "We noticed that many applications used these methods in a somewhat ad hoc manner. Our goal with this paper was to provide guidelines for more robust and consistent use of interpretable machine learning methods in computational biology."

One major pitfall the paper addresses is the reliance on a single interpretable machine learning method. Instead, the researchers recommend using multiple interpretable machine learning methods with diverse sets of hyperparameters and comparing their results to obtain a more comprehensive understanding of the model behavior and its underlying interpretations.

"While some machine learning models seem to work surprisingly well, we often do not fully understand why," Ma said. "In scientific domains like biomedicine, understanding why models work is crucial for discovering fundamental biological mechanisms."

The paper also warns against cherry-picking results when evaluating interpretable machine learning methods, as this can lead to incomplete or biased interpretations of scientific findings.

Chen emphasized that the guidelines may have broader implications for a wider audience of researchers interested in applying interpretable machine-learning methods to their work.

"We hope that machine learning researchers developing new interpretable machine learning methods and tools -; particularly those working on explaining large language models -; will carefully consider the human-centric aspects of interpretable machine learning," Chen said. "This includes understanding who their target user is and how the method will be used and evaluated."

While understanding model behavior remains crucially important for scientific discovery and a fundamentally unsolved machine learning problem, the authors hope these challenges spur further interdisciplinary collaborations to facilitate the broader use of AI for scientific impact.

Source:
Journal reference:

Chen, V., et al. (2024). Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments. Nature Methods. doi.org/10.1038/s41592-024-02359-7.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Researchers investigate the gene-brain-behavior link in autism using generative machine learning