Coronary artery diseases are incredibly complex, with a wide range of contributing factors. These diseases are also associated with many clinical manifestations. Therefore, it is imperative to detect coronary artery diseases early because that would enable the implementation of preventive measures, such as lipid-lowering therapies and lifestyle modifications.
Study: Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Image Credit: Gorodenkoff / Shutterstock
Background
A quantitative difference in the amount of plaque composition and coronary stenosis helps to assess the risk of myocardial infarction and death. The misclassification and missed diagnosis of coronary artery disease may lead to stroke, myocardial infarction, and death.
Hypertension, dyslipidemia, diabetes, and smoking are common factors associated with coronary artery disease events. These factors are included in tools, such as Framingham Risk Score, pooled cohort equations (PCEs), and SCORE2, used to predict coronary artery disease events. However, these tools use only a small amount of data from electronic health records (EHRs) and discard the majority. Some of the critical data discarded by these tools include vital signs, medications, laboratory tests, symptoms, and many other clinical features.
Machine learning could be used to analyze and interpret large quantities of heterogeneous clinical data of patients through EHR-based health systems. For example, machine learning models have been designed to accurately predict the five-year or ten-year risk of coronary artery disease based on EHR data.
A recent EHR-based model has outperformed PCEs in predicting one-year coronary artery disease status. These models are predominantly used as a classification tool for a binary framework. However, they do not measure disease on a continuous scale, i.e., quantitative framework. The quantitative manner of evaluation for coronary artery disease could be more beneficial, as it will help provide improved personalized care.
A New Study
A recent study published in the Lancet journal investigated whether a quantitative in-silico score for coronary artery disease (ISCAD), based on a machine learning model, can be used as a clinical marker to detect coronary artery disease. It also assessed if the identified marker could be used in risk stratification and to evaluate the prognosis of the disease.
Typically, molecules or anthropometric measurements are used as conventional in vivo indicators of disease. The current study evaluated the utility of ISCAD, which is based on multiple clinical data points in EHRs, as an in-silico marker for coronary artery disease.
The study cohort consisted of participants from two EHR-linked biobanks in the USA and the UK. The BioMe Biobank consists of more than 60,000 USA-based individuals of diverse ethnicities. In addition, the model was externally tested in the UK Biobank, which comprises more than 500,000 British individuals.
The clinical features associated with coronary artery disease were extracted from EHRs. The machine learning model used in this study was adapted from a previous model associated with short-term risk prediction of coronary artery disease through a binary framework based on EHR data. The probability scores from the model were used as a quantitative coronary artery disease marker.
Key Findings
A total of 95,935 participants (35,749 from the BioMe Biobank and 60,186 from the UK Biobank) were recruited for this study. The median age of the participants was around 62 years. The BioMe Biobank sample constituted 41% of males and 59% of females, and 14% were diagnosed with coronary artery disease. Similarly, the UK Biobank comprised 42% of males, 58% of females, and 14% of the participants were diagnosed with coronary artery disease.
The current clinical prediction model for coronary artery disease presented an area under the receiver operating characteristic (ROC) curve of 0.95 and 0.93 in the BioMe validation and holdout sets, respectively. It also predicted a sensitivity of 0.84 and specificity of 0.8 in the UK Biobank external test set.
Based on known risk factors, PCEs, and polygenic risk scores, ISCAD captured coronary artery disease risk. Coronary artery stenosis was found to be quantitatively elevated with an increase in ISCAD quartiles. It also indicated an increased risk of multivessel coronary artery disease, obstructive coronary artery disease, and stenosis of major coronary arteries. In addition, all-cause death and hazard ratios gradually increased over ISCAD deciles.
Conclusions
The current study has some limitations, including the use of diagnostic codes to establish coronary artery disease case status, which has a high possibility of misclassification. Additionally, a low sample size might affect the generalisability of the findings.
Importantly, analysis of EHR data via machine learning models opens a new avenue for evaluating a broad disease spectrum. This study determined the association of ISCAD with clinical outcomes of coronary artery disease, including recurrent myocardial infarction, atherosclerotic plaque burden, and all-cause death. The machine learning-based marker also enabled the identification of underdiagnosed individuals exhibiting high ISCAD and EHR evidence.
In the future, more research is required to determine the association of in silico markers with the occurrence of coronary artery disease events and deaths. The efficacy of this strategy must be further assessed using other populations as well.
Journal reference:
- Forrest, S.I. et al. (2022) Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. The Lancet. https://doi.org/10.1016/S0140-6736(22)02079-7, https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(22)02079-7/fulltext