Study: From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis. Image Credit: Orawan Pattarawimonchai / Shutterstock
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
GluFormer, a transformer-based generative model trained on CGM data, outperforms traditional analysis tools by predicting broad health outcomes and future risks across diverse populations and conditions.
In a recent study posted to the arXiv* preprint* server, scientists from Israel presented a transformer architecture-based generative model for analyzing continuous glucose monitoring data and predicting glycemic patterns. This model has applications in risk stratification, diabetes care, and treatment strategy optimization.
Background
Diabetes has become a global health crisis, affecting over 500 million individuals globally and costing more than 900 billion dollars every year. Although type 2 diabetes has numerous modifiable risk factors, including diet and physical activity levels, it also increases the risk of other comorbidities such as mental health issues, kidney disease, and cardiovascular disease.
Continuous glucose monitoring (CGM) devices have been instrumental in improving diabetes management by lowering the frequency of glycemic events, improving glycemic control, and enhancing the overall quality of life. Continuous glucose monitoring is also being used to detect glucose dysregulation early and personalize dietary choices.
In parallel, the field of medical artificial intelligence (AI) is progressing towards self-supervised learning that can analyze large amounts of unlabeled data, such as those gathered by CGM devices. Trained self-supervised learning models are effectively used to scan data from wearables and histopathological and retinal images to detect diseases.
About the study
In the present study, the researchers described the development of a transformer architecture-based generative model called GluFormer that can analyze large amounts of CGM data from diverse populations.
The model was initially trained on CGM data from non-diabetic participants enrolled in the Human Phenotype Project. This dataset consisted of over 10 million glucose measurements from 10,000 participants. Each measurement was treated as a discrete token, which was used to train the model using the next-token prediction method efficiently. This training method enabled GluFormer to generate and extend the CGM time series.
Furthermore, using self-supervised learning to pre-train the model enabled it to learn from unlabeled data, enhancing GluFormer's ability to capture the complex patterns in the CGM data.
The model's ability to create realistic CGM signals was then tested and validated using quantitative comparisons and visual inspections using major glycemic metrics, such as glucose management indicators and mean glucose levels.
The researchers then applied the model's generalizability to validation cohorts with varying glycemic characteristics and from different geographic locations. These cohorts included individuals with gestational diabetes and type 2 diabetes.
GluFormer was also applied to data from different types of CGM devices and a variety of underlying disorders to test its robustness in different scenarios. The study also compared the model's performance in predicting clinical outcomes against that of other AI-based models, such as multilayer perceptrons and convoluted neural networks.
Additionally, the researchers incorporated data on dietary intake along with the CGM data to create a multimodal version of GluFormer that could improve the prediction accuracy with a focus on glucose responses to specific foods. The researchers further integrated temporal information such as time, day, and month into the GluFormer model to capture the temporal fluctuations in glucose levels.
Results
The results demonstrated GluFormer's ability to analyze and predict CGM data and understand the dynamics of glucose monitoring data from different populations. Uniform manifold approximation and projection, or UMAP, was used to test GluFormer's ability to capture relevant data, and the study found that the embedding patterns showed clinically significant glucose tolerance and glycemic control information.
The model was also able to identify the individual-specific patterns in the CGM data accurately. GluFormer's performance in predicting hemoglobin A1C levels surpassed that of multilayer perceptrons and convoluted neural networks, and the model provided more accurate predictions than the standard clinical measures.
Furthermore, realistic CGM signals with correlated glycemic metric measures were generated when the model was tested on cohorts that included individuals with gestational diabetes and type 2 diabetes, which supported the model's strong predictive capabilities.
Additionally, the multimodal version of GluFormer, incorporating dietary data, predicted CGM data with a higher correlation with observed CGM data, especially surrounding meal times, and improved the model's prediction accuracy. The addition of temporal information further improved the model's ability to generate CGM data that could accurately reflect the temporal changes in glucose levels throughout the day.
Conclusions
Overall, the study found that the transformer architecture-based model, GluFormer, trained using self-supervised learning, demonstrated high levels of accuracy in predicting CGM data and glycemic outcomes in populations with diverse metabolic features and geography. It outperformed other AI-based models, and the incorporation of dietary data in the multimodal version of the model enhanced its predictive abilities and accuracy. These findings highlight the utility of GluFormer in the management of chronic diseases.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
Journal reference:
- Preliminary scientific report.
From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis: Guy Lutsker, Gal Sapir, Anastasia Godneva, Smadar Shilo, Jerry R Greenfield, Dorit Samocha-Bonet, Shie Mannor, Eli Meirom, Gal Chechik, Hagai Rossman, and Eran Segal. arxiv. 2024. DOI:10.48550/arXiv.2408.11876, https://arxiv.org/abs/2408.11876