In a recent study published in The Lancet Digital Health, researchers examined the state of randomized controlled trials (RCTs) for artificial intelligence (AI) algorithms in clinical practice.
Study: Randomised controlled trials evaluating artificial intelligence in clinical practice: a scoping review. Image Credit: metamorworks/Shutterstock.com
Background
The use of AI in healthcare has remarkably surged in the last five years, with some studies indicating that AI models could perform on par with or even better than clinicians. Many models have been evaluated retrospectively and not in real-world settings.
Of around 300 medical devices enabled with AI, some have been assessed in prospective RCTs. This scarcity contributes to uncertainty regarding the possibility of risk to clinicians and patients. Further, AI systems can perform poorly when prospectively deployed.
About the study
In the present study, researchers analyzed the current state of AI in clinical practice. They searched for relevant studies on the International Clinical Trials Registry and PubMed, CENTRAL, and SCOPUS databases between January 1, 2018, and November 14, 2023. References from studies were also screened to identify additional articles.
RCTs that implemented a substantial AI component as an intervention in clinical practice were eligible for inclusion. The intervention included non-linear computational models, i.e., neural networks, decision trees, etc.
Secondary studies, studies evaluating linear risk scores (logistic regression), and those not integrating the intervention into clinical practice were excluded. Abstracts/titles were screened, and full texts were reviewed.
Relevant data from eligible studies were extracted. These included participant characteristics, primary endpoint, clinical task(s), time efficiency endpoint, study location, comparator, AI type/origin, and results.
Studies were stratified by the primary endpoint group, clinical specialty, and AI data modality. Meta-analyses were not performed due to the heterogeneity in endpoints and tasks. Instead, an overview of trial features was presented.
Findings
The researchers identified 6,219 studies and 4,299 trial registrations. Following title/abstract screening, full texts of 133 studies were reviewed, which excluded 60 articles.
Reference screening identified 13 studies. Overall, 86 unique RCTs were included; 43%, 13%, 6%, and 5% of trials were related to gastroenterology, radiology, surgery, and cardiology, respectively.
Gastroenterology RCTs were notable for uniformity, as all trials tested video-based algorithms assisting clinicians. Further, only four groups (Fujifilm, Medtronic, Wuhan University, and Wision AI) conducted most (65%) gastroenterology trials.
In addition, 92% of RCTs were single-country trials undertaken primarily in the United States or China; conversely, six of the seven multi-country trials were conducted in European countries.
The median participant age was 57.3; 48.9% of subjects were male. Twenty-two RCTs reported race/ethnicity; the median proportion of White participants was 70.5%.
The primary endpoints in 46 trials were related to diagnostic performance or yield, such as mean absolute error and detection rate. Eighteen trials examined the effects of AI on care management. Fifteen AI algorithms evaluated patient symptoms and behavior.
Seven RCTs examined AI in clinical decision-making. Fifty-nine trials assessed deep learning models for medical imaging, predominately video-based rather than image-based. Others relied on structured data, i.e., health records, free text, and waveform data.
Most imaging-related AI systems were implemented in an assistive setup, whereas those based on structured data were compared with routine care.
Most models (55%) were developed in industry, followed by academia (41%). Eighty-one trials aimed to show improvement, 80% of which reported significant improvements in their primary endpoint.
Specifically, 46 trials observed improvements for clinicians assisted by AI systems compared to unassisted clinicians. Notably, three RCTs found that standalone AI systems performed better than clinicians. Five trials implemented non-inferiority designs.
Two trials examined non-inferiority between assisted and unassisted clinicians, and three assessed it between clinicians and standalone AI systems.
Overall, 70 trials reported favorable results for their primary endpoint. Sixteen RCTs had negative results, i.e., they found no improvements of assisted clinicians relative to unassisted clinicians, AI systems compared to routine care, and standalone AI models over clinicians.
Conclusions
Taken together, the findings reveal a growing interest in the utility of AI across clinical specialties and regions.
Most trials had favorable outcomes, underscoring the potential of AI systems in improving clinical decision-making, patient symptoms and behavior, and care management.
Notably, the success of AI ultimately depends on its generalizability to target populations and settings. Continued research is essential to deepen the understanding of AI's true effects and limitations.