Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors

New study finds GPT-4 matches radiologists in diagnosing brain tumors from MRI reports, with impressive accuracy in differential diagnoses.

​​​​​​​​​​​​​​Study: Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors. Image Credit: raker/Shutterstock.comStudy: Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors. Image Credit: raker/Shutterstock.com

A recent study published in European Radiology compared the diagnostic performance of Generative Pretrained Transformer 4 (GPT-4) with radiologists using brain tumor reports.

Background

Large-language models (LLMs) have been dominant in global technology discourse. The advent of ChatGPT has simplified using these models conversationally. Among LLMs, the GPT series has particularly received significant attention; its potential to diagnose from an image is notable.

Two studies have demonstrated the potential of GPT-4 in differential diagnosis in neuroradiology. Although these studies suggested a vital role of GPT-4 in radiological diagnosis, no study has evaluated using real-world radiology reports.

About the study

In the present study, researchers examined the diagnostic capability of GPT-4 using real-world radiology reports. ChatGPT (based on GPT-4) was prompted with imaging findings from real reports and asked to provide final and differential diagnoses.

For comparison, the same findings were presented to radiologists. Four general radiologists and three neuroradiologists participated. General radiologists specialize in areas other than imaging diagnosis.

One general radiologist and neuroradiologist reviewed collected findings, while others conducted reading tests. Brain magnetic resonance imaging (MRI) findings of preoperative tumors were collected from two institutions.

Imaging findings were verified by a general radiologist and a neuroradiologist. Diagnoses described in imaging findings were removed, but information on the reporter type (general radiologist or neuroradiologist) was retained.

MRI reports were translated from Japanese to English. ChatGPT was asked to provide three possible diagnoses using the imaging findings. The diagnosis listed as the highest among the three was considered the final diagnosis.

The same imaging findings were provided to two neuroradiologists and three general radiologists; these experts were different from those who provided input reports.

Radiologists’ interpretations and LLM output were assessed against the pathological diagnosis of the tumor. McNemar’s test compared the diagnostic accuracy of differential and final diagnoses between GPT-4 and each radiologist.

In addition, separate analyses were performed based on whether a general radiologist or neuroradiologist prepared the input report. Fisher’s exact test compared the diagnostic accuracy between GPT-4 and all radiologists.

Findings

In total, 150 radiology reports were included; 94 were of female subjects. Pathologies included meningioma, pituitary adenoma, angioma, schwannoma, high- and low-grade glioma, sarcoma, lymphoma, and hemangioblastoma, among others. The accuracy of the final diagnosis was comparable between GPT-4 and radiologists.

The accuracy rate of GPT-4 for final diagnosis was 73%; in comparison, accuracy rates were 65% for one neuroradiologist and two general radiologists, 73% for one neuroradiologist, and 79% for one general radiologist. Further, GPT-4 achieved an accuracy of 94% for differential diagnoses compared to radiologists, whose accuracies ranged from 73% to 89%.

Notably, GPT-4 showed statistically significant differences in the final diagnoses when a general radiologist and a neuroradiologist prepared imaging findings. Its accuracy rates for the final diagnosis were 80% and 60% when the reporter was a neuroradiologist and general radiologist, respectively.

Conclusions

The study compared the diagnostic performance of GPT-4 and five radiologists using brain MRI findings from 150 cases. GPT-4 was 73% accurate in listing the final diagnosis, while radiologists’ accuracies ranged between 65% and 79%.

It was 94% accurate for differential diagnosis, while radiologists achieved 73% – 89% accuracy. Notably, GPT-4 had a significantly higher accuracy for final diagnosis when a neuroradiologist prepared the input reports.

However, there were no significant differences for differential diagnoses, regardless of the reporter type. The study used textual information only and did not assess the effect of including other information, such as MRI images and patient history. Further, GPT-4’s performance was evaluated in only one language; how it varies in different languages remains unknown.

Journal reference:
Tarun Sai Lomte

Written by

Tarun Sai Lomte

Tarun is a writer based in Hyderabad, India. He has a Master’s degree in Biotechnology from the University of Hyderabad and is enthusiastic about scientific research. He enjoys reading research papers and literature reviews and is passionate about writing.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Sai Lomte, Tarun. (2024, October 03). Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors. News-Medical. Retrieved on October 03, 2024 from https://www.news-medical.net/news/20241003/Diagnostic-performance-of-GPT-4-in-analyzing-radiology-findings-from-brain-tumors.aspx.

  • MLA

    Sai Lomte, Tarun. "Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors". News-Medical. 03 October 2024. <https://www.news-medical.net/news/20241003/Diagnostic-performance-of-GPT-4-in-analyzing-radiology-findings-from-brain-tumors.aspx>.

  • Chicago

    Sai Lomte, Tarun. "Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors". News-Medical. https://www.news-medical.net/news/20241003/Diagnostic-performance-of-GPT-4-in-analyzing-radiology-findings-from-brain-tumors.aspx. (accessed October 03, 2024).

  • Harvard

    Sai Lomte, Tarun. 2024. Diagnostic performance of GPT-4 in analyzing radiology findings from brain tumors. News-Medical, viewed 03 October 2024, https://www.news-medical.net/news/20241003/Diagnostic-performance-of-GPT-4-in-analyzing-radiology-findings-from-brain-tumors.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI outperforms radiologists in brain tumor diagnosis