In a recent study published in the NPJ Digital Medicine Journal, researchers explored the use of image-based phenotyping based on dual-energy X-ray absorptiometry (DXA) scans of knees to train a deep-learning model to diagnose knee osteoarthritis cases.
Study: Deep learning-based phenotyping of medical images improves power for gene discovery of complex disease. Image Credit: Miha Creative/Shutterstock.com
Background
The diagnosis of complex diseases using biobank data consisting of International Statistical Classification of Diseases 10th revision (ICD-10) codes and self-reported symptoms and diagnoses, though invaluable in indicating the genetic basis of diseases, are also subject to various biases.
The differential diagnoses can vary based on specialist or non-specialist clinicians, and electronic health records can also differ according to assessments of disease severity and how the patients are billed. Using medical images to perform clinical-grade assessments can provide a standardized and consistent protocol for the diagnosis of various complex diseases.
Given the large volumes of clinical image data present in population biobanks, an automated phenotyping approach can potentially be used to ascertain disease status and severity. Chest X-ray images have been previously used to accurately diagnose severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections and pneumonia.
Furthermore, various genetic studies have also used deep-learning methods to associate significant genome-wide loci with image-derived phenotypes of heart structure, brain morphology, distribution of body fat, and liver fat percentage.
About the study
In the present study, the researchers used DXA images to train a binary classification model to perform at a clinical level to identify knee osteoarthritis cases and used the biobank images to compare the results of the deep learning model with the diagnoses based on ICD-10 codes.
Additionally, they obtained measurements for the minimum joint space width, which directly correlates with the severity of knee osteoarthritis, by training the image segmentation algorithm and used those measurements to compare quantitative and case-controlled approaches in genome-wide association studies.
The researchers also generated a polygenic risk score specific for each phenotype to determine whether, in a dataset of 300,000 cases, improvements in predicted ICD-10 records of knee osteoarthritis were associated with a higher statistical power in finding novel loci.
The United Kingdom Biobank comprising genome sequence data from over 40,000 individuals, was analyzed along with DXA images to determine the genetic factors underlying knee phenotypes.
The Kellgren-Lawrence grading system was then used to perform a binary classification for automated phenotyping of knee osteoarthritis cases using radiography. A total of 546 images, annotated by certified orthopedic surgeons, were used to train and validate the deep-learning model. The trained model was then used on close to 30,000 DXA scan images of knees, and predictions of knee osteoarthritis were compared with diagnosed ICD-10 codes for knee osteoarthritis.
The minimum joint space width between the tibia and the femur was then measured to assess and determine the severity of knee osteoarthritis quantitatively.
The image segmentation algorithm was trained and validated using 63 images derived from DXA scans of knees, in which the positions of the tibia, fibula, and femur were marked at the pixel level, and validated by trained clinicians.
The image-derived phenotypes for knee osteoarthritis were then used for a genome-wide association study to determine the genetic basis of these phenotypes.
Additionally, linkage disequilibrium score regression was performed to determine the heritability of the single nucleotide polymorphisms (SNP) associated with three phenotypes of knee osteoarthritis — based on the ICD-10 code, based on the binary classification, and based on the minimum joint space width measurements.
Results
The findings suggested that the use of deep-learning models to determine the phenotypes for complex diseases at the biobank data scale can potentially improve the analytical power in understanding epidemiological and genetic associations.
Furthermore, although the binary classification and minimum joint space width were genetically correlated, the use of the quantitative measurement significantly increased the number of significant loci determined in the genome-wide association study.
The use of binary classification in case-control phenotyping resulted in a two-fold increase in the number of cases and helped circumvent problems in electronic health records associated with variations in clinicians’ perceptions of the severity of the disease, as well as different definitions of osteoarthritis.
However, the use of minimum joint space width as a quantitative measure to assess disease severity was found to perform significantly better than the binary classification method using a case-control approach.
Conclusions
Overall, the study provided proof-of-concept for the use of large-scale biobank data and a quantitative phenotyping method to directly measure and assess disease severity in a disease with complex phenotypes.
The researchers believe that this approach can not only be extended to other musculoskeletal diseases that depend on radiography for diagnoses, but also to other complex diseases that use quantitative diagnostic methods.