In a recent study published in BMC Medicine, researchers identified diabetic individuals among populations with normal fasting glucose using common physical examination indexes via machine learning techniques.
Study: Detection of diabetic patients in people with normal fasting glucose using machine learning. Image Credit: NicoElNino/Shutterstock.com
Background
Diabetes mellitus (DM) is a growing public health challenge, with many asymptomatic cases going undetected, leading to complications. The International Diabetes Federation projected a rise from 537 million diabetic individuals in 2021 to 643 million by 2030.
Undiagnosed cases burden the healthcare system, prompting an emphasis on early diagnosis and a turn to machine learning for efficient screening. Despite its proven accuracy in risk prediction, relying solely on fasting blood glucose can overlook many cases.
Many people with diabetes present normal fasting glucose, underscoring the need for broader screening methods and further research to refine detection across various demographics.
About the study
The present study collected physical examination data from three hospitals to develop a framework for identifying diabetic patients with normal fasting glucose. This data, categorized as D1, D2, and D3, underwent rigorous cleaning, with samples classified based on the World Health Organization's (WHO's) diabetes diagnostic criteria.
Due to an evident class imbalance in the datasets, the synthetic minority over-sampling technique (SMOTE) was implemented, followed by Z-score normalization for standardization.
The computational model employed multiple machine learning techniques, with the deep neural network (DNN) showing superior performance. Established metrics like sensitivity and accuracy were used to refine the model, considering the data's significant class imbalance.
Despite the 27 features initially used for predictions, there was a drive to optimize this by eliminating potential redundancies. This focused on 13 key features, discerned through manual curation and the max relevance and min redundancy (mRMR) analysis.
For practical application, an online tool, DRING, was designed. Beyond just understanding broad risk factors, the study also introduced a method adapted from the permutation feature importance algorithm, offering a more individualized risk assessment for diabetes onset.
Study results
Between 2015 and 2018, physical examination data was collected from the First Affiliated Hospital of Wannan Medical College, yielding 61,059 samples with normal fasting glucose (NFG).
Nearly 1% (603 participants) of these were identified as diabetic based on a Hemoglobin A1c (HbA1c) level threshold of 6.5%. Notably, the diabetic group had an average Body Mass Index (BMI) of 1.08 units higher and was, on average, older by 10.6 years compared to the non-diabetic group.
The most distinguishing features between diabetics and non-diabetics were absolute lymphocyte count (ALC), age, fasting blood glucose (FBG), BMI, and white blood cell count (WBC), with an additional 11 significant features also identified.
Given that several pairs of features, such as hemoglobin (HGB) and hematocrit (HCT) or neutrophil (NEU) and lymphocyte (LYM), were highly correlated, there was a need to eliminate redundancy to stabilize the model.
Utilizing manual curation and the mRMR technique, an optimal feature space was identified. Out of the initial 27 features, only 13 were chosen. Both methods highlighted the importance of FBG, BMI, ALC, and age. When tested, models built with 13 features slightly outperformed those with 27, showcasing accuracy and sensitivity boosts.
Further validation was performed on two independent test sets, D2 and D3. Both models' Area Under the Curve (AUC) values exceeded 0.95 on D2 and neared 0.90 on D3. Additionally, Youden's (or J) index on D2 was notably high. Manual curation-based models generally outperformed those based on mRMR.
One noticeable drawback was the mRMR model's false positive rate on the extremely imbalanced D2 dataset. Nevertheless, these results demonstrated the model's proficiency in identifying undiagnosed diabetics in the NFG population.
To identify which features were paramount for determining diabetic risk, the study relied on the weights from the 13-feature manual curation model. ALC, FBG, age, sex, and BMI emerged as the top five variables.
Previous research has suggested that even within the NFG range, an increased FBG level amplifies diabetes risk. Notably, age and BMI were reaffirmed as well-established diabetes risk factors, while the difference in diabetes risk between genders was highlighted. Other notable factors included the mean corpuscular volume (MCV) and absolute monocyte count (AMC).
To tailor diabetic risk assessments to individual patients, a framework based on permutation feature importance (PFI) was established. For instance, an external validation set's case was dissected for risk factors.
Despite her FBG appearing within the normal range, this individual's age, FBG, and BMI emerged as the main diabetic risk factors. Such results emphasize the potential for personalized interventions based on individual risk profiles.
The culmination of this work was integrating this analysis into the DRING web server, streamlining its practical application.