Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors

Download PDF Copy

By Tarun Sai LomteReviewed by Lily Ramsey, LLMOct 3 2024

New study finds GPT-4 matches radiologists in diagnosing brain tumors from MRI reports, with impressive accuracy in differential diagnoses.

Study: Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors. Image Credit: raker/Shutterstock.com Study: Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors. Image Credit: raker/Shutterstock.com

A recent study published in European Radiology compared the diagnostic performance of Generative Pretrained Transformer 4 (GPT-4) with radiologists using brain tumor reports.

Background

Large-language models (LLMs) have been dominant in global technology discourse. The advent of ChatGPT has simplified using these models conversationally. Among LLMs, the GPT series has particularly received significant attention; its potential to diagnose from an image is notable.

Two studies have demonstrated the potential of GPT-4 in differential diagnosis in neuroradiology. Although these studies suggested a vital role of GPT-4 in radiological diagnosis, no study has evaluated using real-world radiology reports.

About the study

In the present study, researchers examined the diagnostic capability of GPT-4 using real-world radiology reports. ChatGPT (based on GPT-4) was prompted with imaging findings from real reports and asked to provide final and differential diagnoses.

For comparison, the same findings were presented to radiologists. Four general radiologists and three neuroradiologists participated. General radiologists specialize in areas other than imaging diagnosis.

One general radiologist and neuroradiologist reviewed collected findings, while others conducted reading tests. Brain magnetic resonance imaging (MRI) findings of preoperative tumors were collected from two institutions.

Imaging findings were verified by a general radiologist and a neuroradiologist. Diagnoses described in imaging findings were removed, but information on the reporter type (general radiologist or neuroradiologist) was retained.

MRI reports were translated from Japanese to English. ChatGPT was asked to provide three possible diagnoses using the imaging findings. The diagnosis listed as the highest among the three was considered the final diagnosis.

The same imaging findings were provided to two neuroradiologists and three general radiologists; these experts were different from those who provided input reports.

Radiologists’ interpretations and LLM output were assessed against the pathological diagnosis of the tumor. McNemar’s test compared the diagnostic accuracy of differential and final diagnoses between GPT-4 and each radiologist.

In addition, separate analyses were performed based on whether a general radiologist or neuroradiologist prepared the input report. Fisher’s exact test compared the diagnostic accuracy between GPT-4 and all radiologists.

Findings

In total, 150 radiology reports were included; 94 were of female subjects. Pathologies included meningioma, pituitary adenoma, angioma, schwannoma, high- and low-grade glioma, sarcoma, lymphoma, and hemangioblastoma, among others. The accuracy of the final diagnosis was comparable between GPT-4 and radiologists.

The accuracy rate of GPT-4 for final diagnosis was 73%; in comparison, accuracy rates were 65% for one neuroradiologist and two general radiologists, 73% for one neuroradiologist, and 79% for one general radiologist. Further, GPT-4 achieved an accuracy of 94% for differential diagnoses compared to radiologists, whose accuracies ranged from 73% to 89%.

Notably, GPT-4 showed statistically significant differences in the final diagnoses when a general radiologist and a neuroradiologist prepared imaging findings. Its accuracy rates for the final diagnosis were 80% and 60% when the reporter was a neuroradiologist and general radiologist, respectively.

Conclusions

The study compared the diagnostic performance of GPT-4 and five radiologists using brain MRI findings from 150 cases. GPT-4 was 73% accurate in listing the final diagnosis, while radiologists’ accuracies ranged between 65% and 79%.

It was 94% accurate for differential diagnosis, while radiologists achieved 73% – 89% accuracy. Notably, GPT-4 had a significantly higher accuracy for final diagnosis when a neuroradiologist prepared the input reports.

However, there were no significant differences for differential diagnoses, regardless of the reporter type. The study used textual information only and did not assess the effect of including other information, such as MRI images and patient history. Further, GPT-4’s performance was evaluated in only one language; how it varies in different languages remains unknown.

Journal reference:

Mitsuyama Y, Tatekawa H, Takita H, et al. (2024) Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors. European Radiology. doi: 10.1007/s00330-024-11032-8. https://link.springer.com/article/10.1007/s00330-024-11032-8

Posted in: Device / Technology News | Medical Science News | Medical Research News | Medical Condition News

Comments (0)

Written by

Tarun Sai Lomte

Tarun is a writer based in Hyderabad, India. He has a Master’s degree in Biotechnology from the University of Hyderabad and is enthusiastic about scientific research. He enjoys reading research papers and literature reviews and is passionate about writing.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Sai Lomte, Tarun. (2024, October 03). Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors. News-Medical. Retrieved on January 07, 2026 from https://www.news-medical.net/news/20241003/Diagnostic-performance-of-GPT-4-in-analyzing-radiology-findings-from-brain-tumors.aspx.
MLA
Sai Lomte, Tarun. "Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors". News-Medical. 07 January 2026. <https://www.news-medical.net/news/20241003/Diagnostic-performance-of-GPT-4-in-analyzing-radiology-findings-from-brain-tumors.aspx>.
Chicago
Sai Lomte, Tarun. "Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors". News-Medical. https://www.news-medical.net/news/20241003/Diagnostic-performance-of-GPT-4-in-analyzing-radiology-findings-from-brain-tumors.aspx. (accessed January 07, 2026).
Harvard
Sai Lomte, Tarun. 2024. Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors. News-Medical, viewed 07 January 2026, https://www.news-medical.net/news/20241003/Diagnostic-performance-of-GPT-4-in-analyzing-radiology-findings-from-brain-tumors.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.