Revolutionizing disease prediction: MILTON framework utilizes biobank datasets to identify 3,213 diseases

Unlocking disease prediction: How the MILTON framework utilizes multi-omics data to transform health insights.

Study: Disease prediction with multi-omics and biomarkers empowers case–control genetic discoveries in the UK Biobank. Image Credit: Xray Computer/Shutterstock.comStudy: Disease prediction with multi-omics and biomarkers empowers case–control genetic discoveries in the UK Biobank. Image Credit: Xray Computer/Shutterstock.com

In a recent study published in Nature Genetics, a group of researchers developed and applied an ensemble machine-learning framework (MILTON) to predict diseases and enhance genetic association analyses using multi-omics data from the United Kingdom Biobank (UKB).

Background 

Identifying individuals at high risk of developing diseases is vital for preventative medicine. Still, traditional risk assessment tools, which rely on factors like age and family history, may not fully capture the complexity of disease biology.

Large-scale biobanks, such as the UKB, incorporate multi-omics data like blood tests, proteomics, and metabolomics, which provide opportunities to discover novel biomarkers.

These comprehensive datasets enable the identification of biomarker combinations that enhance disease prediction beyond individual markers. Further research is necessary to understand the biological processes underlying complex diseases better and improve predictive models.

About the study

The UKB cohort includes 502,226 participants aged 37 to 73 years, with a median age of 58. Of these, 54.4% are female. The data provides comprehensive information such as diagnosis records, blood biochemistry, body size measures, genomics, and proteomics data. All participants provided informed consent and participated voluntarily.

The Finnish Gene (FinnGen) cohort consists of 412,181 individuals, 55.9% of whom are female, with a median age of 63. Participants also provided informed consent and took part voluntarily.

FinnGen data was not accessed at the patient level; only Genome-Wide Association Study (GWAS) summary statistics were used. The research adhered to all ethical regulations, with approvals obtained from the appropriate ethics boards. 

The UKB study received approval from the North West Centre for Research Ethics Committee. At the same time, the Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa approved the FinnGen study.

The Finnish Institute for Health and Welfare, the Digital and Population Data Service Agency, the Social Insurance Institution, and Statistics Finland granted additional approvals for FinnGen.

Both studies carefully processed the data, ensuring accurate case and control definitions. Extensive filtering was applied to cases and controls to maintain consistency in the distribution of age, sex, and other baseline characteristics.

Study results 

Clinical biomarkers play a crucial role in diagnosing and evaluating diseases by providing measurable indications of a condition’s presence and severity. In the context of phenome-wide association studies (PheWASs), biomarkers also offer an opportunity to identify misclassified or cryptic cases.

MILTON, a machine-learning method, has been introduced to use quantitative biomarkers to predict disease status for 3,213 disease phenotypes. The technique works by first learning a disease-specific signature from diagnosed patients and then predicting potential novel cases among the original controls. These augmented cohorts are used for rare-variant collapsing analysis to compare with baseline cohorts.

MILTON's disease prediction models are defined based on the time lag between biomarker sample collection and diagnosis. In the UKB, samples may have been collected up to 16.5 years before or 50 years after diagnosis.

MILTON was trained using three different time models: prognostic (up to 10 years after sample collection), diagnostic (up to 10 years before), and time-agnostic (all diagnosed cases). A 10-year cutoff was determined to be optimal after a sensitivity analysis on 400 randomly selected International Classification of Diseases, 10th Revision (ICD10) codes.

MILTON was trained on 67 features, including blood biochemistry and count measures, urine assays, body size, blood pressure, sex, age, spirometry, and fasting time. The model's performance was assessed using the area under the curve (AUC) metric. MILTON achieved AUC ≥ 0.7 for 1,091 ICD10 codes, AUC ≥ 0.8 for 384 codes, and AUC ≥ 0.9 for 121 codes across all time models and ancestries.

Diagnostic models generally performed better than prognostic ones across 1,466 ICD10 codes. For example, in European (EUR) ancestry participants, diagnostic models had a higher median AUC (0.668 versus 0.647) and sensitivity (0.586 versus 0.570).

MILTON also showed stable performance for EUR and African ancestries, while performance improved for South Asian diagnostic models as the number of cases increased.

MILTON’s ability to predict disease before onset was further validated. When individuals with a high case probability (0.7 ≤ Pcase ≤ 1) were analyzed, 97.41% of ICD10 codes were significantly enriched in participants who were later diagnosed with the corresponding conditions. These results affirm MILTON’s effectiveness in identifying emerging cases and augmenting genetic association analyses. 

Conclusions

To summarize,  MILTON predicts diseases using multi-omics and biomarkers, enhancing case-control studies across five UKB ancestries. Despite the broad, non-disease-specific feature set, MILTON achieved high predictive power for numerous phenotypes, with AUC > 0.7 for 1,091 ICD10 codes, AUC > 0.8 for 384, and AUC > 0.9 for 121.

However, for some diseases, predictive power remained low, indicating the need for more informative features.

MILTON often outperformed polygenic risk scores (PRSs) but underperformed in diseases like melanoma and breast cancer. Proteomics data improved predictions for 52 phenotypes. MILTON also identified 182 putative novel gene-disease signals requiring further validation.

Journal reference:
Vijay Kumar Malesu

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.    

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Kumar Malesu, Vijay. (2024, September 17). Revolutionizing disease prediction: MILTON framework utilizes biobank datasets to identify 3,213 diseases. News-Medical. Retrieved on December 21, 2024 from https://www.news-medical.net/news/20240917/Revolutionizing-disease-prediction-MILTON-framework-utilizes-biobank-datasets-to-identify-3213-diseases.aspx.

  • MLA

    Kumar Malesu, Vijay. "Revolutionizing disease prediction: MILTON framework utilizes biobank datasets to identify 3,213 diseases". News-Medical. 21 December 2024. <https://www.news-medical.net/news/20240917/Revolutionizing-disease-prediction-MILTON-framework-utilizes-biobank-datasets-to-identify-3213-diseases.aspx>.

  • Chicago

    Kumar Malesu, Vijay. "Revolutionizing disease prediction: MILTON framework utilizes biobank datasets to identify 3,213 diseases". News-Medical. https://www.news-medical.net/news/20240917/Revolutionizing-disease-prediction-MILTON-framework-utilizes-biobank-datasets-to-identify-3213-diseases.aspx. (accessed December 21, 2024).

  • Harvard

    Kumar Malesu, Vijay. 2024. Revolutionizing disease prediction: MILTON framework utilizes biobank datasets to identify 3,213 diseases. News-Medical, viewed 21 December 2024, https://www.news-medical.net/news/20240917/Revolutionizing-disease-prediction-MILTON-framework-utilizes-biobank-datasets-to-identify-3213-diseases.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
New long COVID index highlights five symptom subtypes