Leveraging cutting-edge metabolomics, this study unveils a cost-effective and accurate diabetes prediction model that could revolutionize early detection and prevention strategies.
Study: Novel type 2 diabetes prediction score based on traditional risk factors and circulating metabolites: model derivation and validation in two large cohort studies. Image Credit: Shuterstock AI
In a recent study published in the journal eClinicalMedicine, researchers assessed the incremental value of adding metabolome-derived biomarkers to the traditional Cambridge Diabetes Risk Score (CDRS) in predicting a 10-year risk of diabetes. The study utilized data from two large cohorts: the UK Biobank and the German ESTHER cohort, ensuring robust model development and validation. Metabolite data from more than 86,000 UK Biobank (training and internal validation) and almost 4,400 German ESTHER cohort (external validation) participants revealed 11 biomarkers that significantly improved CDRS accuracy (0.815 to 0.834).
Notably, a concise predictive model using only four low-cost and easy-to-obtain metabolite biomarkers achieved comparable accuracy, highlighting their utility in routine diabetes risk assessment.
Background
Type 2 diabetes (T2D) is a chronic medical condition characterized by unhealthy blood sugar levels, leading to potentially life-threatening complications, including cardiovascular diseases (CVDs), kidney diseases, and vision loss. The condition is caused by the body's inability to secrete or utilize sufficient quantities of insulin and has been attributed to genetics, health behaviors (sleep and physical activity levels), and high weight.
Alarmingly, T2D prevalence has been increasing at unprecedented levels, resulting in substantial economic, quality of life, and mortality burdens to patients and their families. Unfortunately, no cure for diabetes exists, with traditional clinical interventions aiming to mitigate or delay T2D onset. The early detection or prediction of T2D risk is essential in preparing clinicians and potential patients for chronic T2D. Unfortunately, present predictive approaches, while able to differentiate between low and high risk, lack specificity and may be confounded by combinations of risk factors.
About the study
Advances in nuclear magnetic resonance (NMR) spectroscopy and their applications in metabolomics research provide more comprehensive and nuanced views of various metabolomic alterations preceding T2D onset, highlighting their potential in predictive T2D modeling. The present study leverages this concept in deriving a novel T2D risk score using NMR metabolomics data from two long-term population-based datasets.
Study data for model derivation and internal validation was obtained from the United Kingdom Biobank (UKB; 70% training, 30% validation), comprising 502,493 participants aged 37–73 years across 22 sites in Scotland, Wales, and England. The resultant model was externally validated using the ESTHER cohort, a German (Saarland)-centric population dataset obtained from 9,940 participants between 50 and 75 years. Participants with medical histories of diabetes and missing data were excluded from modeling and analyses.
NMR metabolomics data was obtained using the high-throughput Nightingale Health platform, comprising 250 blood plasma-derived metabolites. During analyses, one of the metabolites (glycerol) was lacking from most participants' datasets and was hence excluded from the final model derivation. Model selection identified the Least Absolute Shrinkage and Selection Operator (LASSO) as being most reliable. A log transformation of input data (thereby ensuring normality and accounting for outliers) was conducted to enhance model performance further.
The study also performed subgroup analyses to examine the robustness of the new model across different groups, including age, sex, and obesity status. To compare and improve upon the current T2D predictive gold standard, variables from the Cambridge Diabetes Risk Score (CDRS) were included in model derivation such that the 249 metabolites assessed herein were added to CDRS variables in increments. This allows for computing the relative improvement in predictive power for each additional metabolomic variable and identifying those with the highest predictive power (receiver operating characteristic curve [ROC] analyses).
Study findings
Of the more than 512,000 participants comprising the UKB and ESTHER cohorts, 86,232 (UKB) and 4,383 (ESTHER) participants met the study inclusion criteria and were included in analyses (model derivation and validation). These participants were of similar ages (59.9 and 60.2) and sex distributions (44.3% and 42.7% males) in the UKB and ESTHER datasets, respectively. Notably, baseline body mass index (BMI) and Hemoglobin A1c (HbA1c) levels were almost identical between both cohorts, highlighting their comparability.
LASSO analyses revealed 11 metabolites with the highest predictive power of the 249 analyzed. These metabolites comprised glycolysis-associated (n = 4), ketone bodies (n = 2), amino acid (n = 2), lipoprotein-associated (n = 2), and a fatty acid-related metabolite (n = 1). Key examples include glucose, pyruvate, lactate, and citrate. Notably, the 10-year predictive accuracy of these metabolites independent of CDRS variables was high (C-index = 0.733 and 0.735 in internal and external validation datasets, respectively). When UKB-derived metabolites were incrementally combined with CRDS, predictive accuracy (baseline C-index = 0.815) was enhanced significantly (combination C-index = 0.834).
Similar improvements were observed in the external ESTHER validation cohort (C-index increased from 0.770 to 0.798). Impressively, the novel "UK Biobank Diabetes Risk Score (UKB-DRS)" model achieved comparable improvements utilizing only four of the 11 identified metabolites.
The simplified model’s robustness was evident in calibration curves, which showed similar predictive performance to the full model in both cohorts.
Conclusions
The present study represents the most extensive dataset used in deriving a T2D predictive model and highlights the value of NMR-derived metabolite data in assessing an individual's 10-year risk of developing T2D. Advances in NMR imaging have substantially reduced the financial burden of these assessments, and the ease of data acquisition (small quantities of blood-derived plasma) highlights the methodological and clinical utility of the approach.
The study presents a novel UKB-DRS model that substantially improves the predictive accuracy of the current gold standard (CDRS), thereby providing clinicians and potential patients additional time to mitigate, delay, or avoid T2D onset. Future studies are recommended to validate this model across ethnically diverse populations and younger age groups.
Journal reference:
- Xie, R., Herder, C., Sha, S., Peng, L., Brenner, H., & Schöttker, B. (2025). Novel type 2 diabetes prediction score based on traditional risk factors and circulating metabolites: model derivation and validation in two large cohort studies. In eClinicalMedicine (Vol. 79, p. 102971), DOI – 10.1016/j.eclinm.2024.102971, https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(24)00550-9/fulltext