Accelerating Drug Discovery and Development with Effective Data Housekeeping

By Keynote ContributorDr. Soroosh AfyouniHead of Health Data Sciences 
bioXcelerate AI

Written by Keynote Contributor Soroosh Afyouni, head of health data science at bioXcelerate AI. 

Every household seems to have that drawer of miscellaneous belongings. When you need something, you know it’s in there, but it still takes hours to sift through the clutter you should’ve tossed years ago. The massive amount of data being stored in ever-expanding biobanks is a bit like ‘that drawer,’ and it makes finding consistently good quality data increasingly challenging.

Living in the digital age, there’s no shortage of information available, and organizations of all sorts are making it their business to use it. This is especially true now, with advanced technology allowing businesses to analyze vast swathes of data in a way that, up until recently, simply wasn’t possible.

Over the past few decades, healthcare authorities worldwide have established large biobanks, including patient medical records, clinical trial data, and genetic information. While these biobanks hold immense value, the data quality is often inconsistent and comes in a variety of shapes and formats, making it difficult to draw reliable comparisons. Trying to analyze them side by side is like comparing apples to oranges – it just doesn’t work.

This inconsistency is becoming a significant issue. It not only hampers collaboration but also limits the potential of powerful tools like AI and machine learning to deliver meaningful insights. That’s why it’s important to develop a strategy for managing health data globally to streamline the drug development process and, in turn, create more effective outcomes for patients.

The Era of ‘Big Data’ is Now

There’s no denying that data offers immense opportunities to transform a range of industries, especially healthcare. However, if data quality is poor, it can hinder how effectively the data can be used. According to a study by Sun et al. (2022)1, up to 50% of clinical trials fail due to a lack of clinical efficacy. Put simply, if the initial insights aren’t based on strong data-driven evidence, then the treatment is unlikely to be optimally effective.

As advances in technology, such as AI, continue to enhance our ability to extract insights from data, the drive to collect it at an unprecedented pace – and the potential to revolutionize drug development – has never been greater. However, the technology is only as powerful as the data it processes; without high-quality, well-curated data, even the most advanced AI models can falter.

Inconsistent formats, varying degrees of accuracy, and incomplete datasets are common challenges that, without proper attention, can undermine the effectiveness of data-driven insights2. Disparities like these can result in spurious conclusions, ineffective treatments, and missed opportunities to address patients’ unmet needs.

Making Data FAIR

To fully maximize the potential of advanced techniques in healthcare – such as AI, machine learning, and other analytical methods – meticulous data curation and management are essential3. Clean, comprehensive, and well-structured data provides the solid foundation upon which these advanced techniques can build transformative innovations in the healthcare industry.

Capsule in mid-air on graphs and big data of the global diffusion of the new pharmaceutical drug.

Image Credit: paulista/Shutterstock.com

Adherence to the FAIR data principles—making data findable, accessible, interoperable, and reusable—is a key component in achieving this. By ensuring data meets these standards, it becomes easier to organize, integrate, and reuse it across various AI-driven applications. Importantly, these principles foster a culture of transparency, innovation, and collaboration, further strengthening the integrity of insights derived from advanced analytical techniques, including AI.

The importance of creating a robust repository of coherent, standardized data extends beyond just this sector. Effective data housekeeping is essential for enabling meaningful collaboration between industry and academia. Cutting-edge research facilities, coupled with the practical clinical development used in industry, leverage the strengths of both sectors. By working together, they can establish standardized data practices, share valuable insights – from pre-clinical research through to development – and construct robust frameworks for data management that align with FAIR principles.

This collaboration ensures that data is not only well-organized and of high quality but also readily accessible and reusable, enhancing its relevance and applicability to real-world challenges.

Data Strategy is Key

Establishing a repository of health data that is accurate, comparable, and reusable is a vital step for maximizing the potential benefits of AI and machine learning in drug development, and reproducibility is crucial to achieving this4. When scientists can revisit previous work and obtain the same results, findings are verified, and trust in the insights drawn from the data is built. Additionally, reproducibility helps minimize errors by revealing inconsistencies in results. Finally, adherence to FAIR data principles facilitates reproducible outcomes, making collaboration across teams and industries smoother and more effective.

By working together, industry and academia can take a unified approach to ensure data is carefully curated, standardized, and relevant. Ultimately, this will drive faster and more effective patient outcomes while guaranteeing that the data remains adaptable and, as far as possible, future-proof. In doing so, we can enable advanced analytical methodologies to reach their full potential and drive progression in healthcare and beyond.

References

  1. Sun, D., Gao, W., Hu, H. and Zhou, S. (2022). Why 90% of clinical drug development fails and how to improve it? Acta Pharmaceutica Sinica B, [online] 12(7). Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9293739/.
  2. Tala Talaei Khoei and Singh, A. (2024). Data reduction in big data: a survey of methods, challenges and future directions. International journal of data science and analytics. doi:https://doi.org/10.1007/s41060-024-00603-z.
  3. Wise, J., de Barron, A.G., Splendiani, A., Balali-Mood, B., Vasant, D., Little, E., Mellino, G., Harrow, I., Smith, I., Taubert, J., van Bochove, K., Romacker, M., Walgemoed, P., Jimenez, R.C., Winnenburg, R., Plasterer, T., Gupta, V. and Hedley, V. (2019). Implementation and relevance of FAIR data principles in biopharmaceutical R&D. Drug Discovery Today, [online] 24(4), pp.933–938. doi:https://doi.org/10.1016/j.drudis.2019.01.008.
  4. Lake, F. (2019). Artificial intelligence in drug discovery: what is new, and what is next? Future Drug Discovery, 1(2), p.FDD19. doi:https://doi.org/10.4155/fdd-2019-0025.

About Soroosh Afyouni

Dr Soroosh Afyouni is the Head of Health Data Sciences at bioXcelerate AI. Prior to completing his PhD in Statistical Neuroimaging at the University of Warwick in 2017, where he specialized in statistical network and time series analysis, Soroosh received a Master of Engineering from the University of Birmingham in 2012.

From 2017 to 2020, Soroosh continued his research as a (junior and senior) postdoctoral researcher by joining the Big Data Institute at the University of Oxford, where he developed time series models for accurate estimates of human brain activities in large-scale datasets such as the UK Biobank. During his time at Oxford, Soroosh received a Merit Award from the Internal Organization for Human Brain Mapping. In 2021, Soroosh joined the University of Cambridge’s Department of Psychology and Faculty of Mathematics to focus on developing ML methods for early diagnosis of Alzheimer’s disease.

In addition to his academic background, Soroosh has spent close to two years at a US-based management consulting firm, where he worked with the biggest pharmaceutical companies to address their R&D and commercial strategic challenges ranging from utilizing electronic health records in clinical trials to designing and pressure testing newly designed R&D operating models. Soroosh joined bioXcelerate in 2023, where he will be working on the application of statistical and machine learning methods in precision medicine.

Disclaimer: This article has not been subjected to peer review and is presented as the personal views of a qualified expert in the subject in accordance with the general terms and conditions of use of the News-Medical.Net website.  

Last Updated: Mar 12, 2025

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.