MolCompass: A breakthrough in identifying weak areas of chemical prediction models

In recent years, machine learning models have become increasingly popular for risk assessment of chemical compounds. However, they are often considered 'black boxes' due to their lack of transparency, leading to skepticism among toxicologists and regulatory authorities. To increase confidence in these models, researchers at the University of Vienna proposed to carefully identify the areas of chemical space where these models are weak. They developed an innovative software tool ('MolCompass') for this purpose and the results of this research approach have just been published in the prestigious Journal of Cheminformatics.

Over the years, new pharmaceuticals and cosmetics have been tested on animals. These tests are expensive, raise ethical concerns, and often fail to accurately predict human reactions. Recently, the European Union supported the RISK-HUNT3R project to develop the next generation of non-animal risk assessment methods. The University of Vienna is a member of the project consortium. Computational methods now allow the toxicological and environmental risks of new chemicals to be assessed entirely by computer, without the need to synthesize the chemical compounds. But one question remains: How confident are these computer models?

It's all about reliable prediction

To address this issue, Sergey Sosnin, a senior scientist of the Pharmacoinformatics Research Group at the University of Vienna, focused on binary classification. In this context, a machine learning model provides a probability score from 0% to 100%, indicating whether a chemical compound is active or not (e.g., toxic or non-toxic, bioaccumulative or non-bioaccumulative, a binder or non-binder to a specific human protein). This probability reflects the confidence of the model in its prediction. Ideally, the model should be confident only in its correct predictions. If the model is uncertain, giving a confidence score around 51%, these predictions can be disregarded in favor of alternative methods. A challenge arises, however, when the model is fully confident in incorrect predictions.

This is the real nightmare scenario for a computational toxicologist. If a model predicts that a compound is non-toxic with 99% confidence, but the compound is actually toxic, there is no way to know that something was wrong."

Sergey Sosnin, senior scientist of the Pharmacoinformatics Research Group, University of Vienna

The only solution is to identify areas of 'chemical space' - encompassing possible classes of organic compounds - where the model has 'blind spots' in advance and avoid them. To do this, a researcher evaluating the model must check the predicted results for thousands of chemical compounds one by one - a tedious and error-prone task.

Overcoming this significant hurdle

"To assist these researchers," Sosnin continues, "we developed interactive graphical tools that display chemical compounds onto a 2D plane, like geographical maps. Using colors, we highlight the compounds that were predicted incorrectly with high confidence, allowing users to identify them as clusters of red dots. The map is interactive, enabling users to investigate the chemical space and explore regions of concern."

The methodology was proven using an estrogen receptor binding model. After visual analysis of the chemical space, it became clear that the model works well for e.g. steroids and polychlorinated biphenyls, but fails completely for small non-cyclic compounds and should not be used for them.

The software developed in this project is freely available to the community on GitHub. Sergey Sosnin hopes that MolCompass will lead chemists and toxicologists to a better understanding of the limitations of computational models. This study is a step toward a future where animal testing is no longer necessary and the only workplace for a toxicologist is a computer desk.

Source:
Journal reference:

Sosnin. S., et al. (2024). MolCompass: multi-tool for the navigation in chemical space and visual validation of QSAR/QSPR models. Journal of Cheminformatics. doi.org/10.1186/s13321-024-00888-z.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine learning reveals sleep quality and anxiety as major predictors of depression