In radiology, diagnostic imaging requires specialized knowledge to interpret the findings associated with a wide variety of diseases. Fortunately, in recent years, generative AI models, such as Chat Generative Pre-trained Transformer (ChatGPT), have shown potential as diagnostic tools in the medical field, but their accuracy must be evaluated for optimal use in the future.
Therefore, Dr. Daisuke Horiuchi and Associate Professor Daiju Ueda of Osaka Metropolitan University's Graduate School of Medicine led a research team that compared the diagnostic accuracy of ChatGPT and radiologists. They used 106 musculoskeletal radiology cases with patient medical history, images, and imaging findings.
For this study, each case's information was put into GPT-4 and GPT-4 with vision (GPT-4V) to generate diagnoses. As for the radiologists, a radiology resident and a board-certified radiologist were provided with the same cases and asked to determine the diagnoses. Results showed that GPT-4 outperformed GPT-4V and was on par with radiology residents. On the contrary, the diagnostic accuracy of ChatGPT was subpar in comparison to board-certified radiologists.
While the results of this study indicate that ChatGPT may be useful for diagnostic imaging, its accuracy cannot compare to a board-certified radiologist. Additionally, this study suggests that its performance as a diagnostic tool must be fully understood before it can be used. Generative AI, including ChatGPT, is advancing every day, and it is greatly expected to become an auxiliary tool for diagnostic imaging in the future."
Dr. Daisuke Horiuchi, Osaka Metropolitan University's Graduate School of Medicine
The findings were published in European Radiology.
Source:
Journal reference:
Horiuchi, D., et al. (2024). ChatGPT’s diagnostic performance based on textual vs. visual information compared to radiologists’ diagnostic performance in musculoskeletal radiology. European Radiology. doi.org/10.1007/s00330-024-10902-5.