GPT-4 gives physicians an edge in complex case management

Download PDF Copy

By Dr. Chinta SidharthanReviewed by Susha Cheriyedath, M.Sc.Feb 6 2025

Research reveals that doctors using GPT-4 make better management decisions, spend more time on cases, and match AI-only performance—reshaping the future of medical decision support.

Study: GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Image Credit: Have a nice day Photo / Shutterstock

Artificial intelligence (AI) tools are extensively explored in current research to enhance medical diagnosis and decision-making. A recent study published in the journal Nature Medicine, investigated whether the large language model (LLM) Generative Pre-trained Transformer 4 (GPT-4) can improve physicians' performance in management reasoning tasks in complex clinical scenarios.

AI in the Clinic

AI-augmented reflection: Physicians using GPT-4 exhibited more patient-centered reasoning, with some responses showing increased empathy and consideration for broader patient contexts.

Clinical decision-making involves two key components—diagnostic and management reasoning. While diagnostic reasoning benefits from AI tools that generate differential diagnoses, management reasoning remains a more complex challenge. Physicians must weigh multiple factors, such as patient preferences, risks, costs, and treatment options, often with no single correct answer.

Traditional AI systems have provided second opinions but have not consistently demonstrated an advantage in management decisions. In this regard, LLMs such as GPT-4, with their ability to process vast amounts of medical knowledge, may help bridge this gap by serving as cognitive partners. However, their impact on real-world clinical decision-making remains uncertain in terms of real-world implementation, even though this study demonstrated a clear performance advantage. Past studies have shown that AI can support diagnostic accuracy, but its role in nuanced decision-making processes such as treatment planning and patient management is underexplored.

About the Study

To understand the utility of AI tools in clinical decision-making, the present study examined whether physicians assisted by GPT-4 outperform those using conventional resources alone. The team conducted a randomized controlled trial between November 2023 and April 2024, enrolling 92 practicing physicians.

Time spent vs. performance: While AI-assisted doctors took an average of 119.3 seconds longer per case, a deeper analysis revealed that time spent correlated positively with better decision-making, even after adjusting for response length.

The participants were randomly assigned to two groups—one that used GPT-4 for assistance in management decisions along with conventional medical resources and the other that relied solely on conventional resources. Additionally, an AI-only group was included, where GPT-4 independently answered the cases without physician involvement, allowing direct comparison between AI-assisted physicians and those using conventional resources. Each physician was tasked with solving five case studies created by experts based on real but de-identified patient encounters.

Furthermore, to replicate real clinical conditions, case details were revealed sequentially, which required physicians to adjust their management plans dynamically. The study's primary objective was to evaluate the difference in total scores between the two groups, which was assessed using expert-developed scoring rubrics. The secondary outcomes explored in the study included domain-specific performance, response length, and time spent per case.

To ensure reliability, three independent graders evaluated the responses with an agreement rate of 82%. The physicians participated remotely or in person and had up to one hour to complete as many cases as possible. The study was designed to evaluate whether GPT-4 could augment human decision-making in complex clinical scenarios rather than replace physicians outright.

Results and Implications

The researchers found that physicians using GPT-4 for assistance performed significantly better in management reasoning compared to those using conventional resources alone. The AI-assisted group scored an average of 6.5 percentage points higher (95% confidence interval: 2.7 to 10.2, P < 0.001).

Interestingly, the performance of GPT-4 alone in these tasks was comparable to that of the physicians using the AI tool (43.7% vs. 43.0%, respectively, P = 0.80), and both outperformed the conventional group that did not use AI assistance (35.7%). While not statistically significant, the AI-only group trended toward better performance than AI-assisted physicians (43.7% vs. 43.0%).

The study showed that compared to physicians who used only conventional resources, GPT-4 users excelled in management decision-making (40.5% vs. 33.4%, P = 0.001), diagnostic reasoning (56.8% vs. 45.8%, P = 0.009), and context-specific decisions (42.4% vs. 34.9%, P = 0.002). However, aspects such as factual recall and general knowledge scores showed no significant differences between the two groups.

AI-only surprises: The AI-alone group performed comparably to AI-assisted doctors, raising questions about whether AI could independently assist in certain clinical decisions without human input in future applications.

Furthermore, AI-assisted physicians were found to spend more time per case (119.3 seconds longer, P = 0.022), suggesting deeper engagement with decision-making. Notably, even after adjusting for time spent and response length, AI-assisted physicians still outperformed those using conventional resources, indicating that higher scores were not solely due to longer responses. Moreover, an analysis of potential harm found no significant increase in harmful decisions among AI-assisted physicians compared to the control group.

Specifically, AI-assisted physicians had a lower likelihood of making medium-risk harmful decisions compared to those using conventional resources (8.5% vs. 11.4%) and similar rates of high-risk harm (4.2% vs. 2.9%). The severity of harm was also comparable between groups, with mild-to-moderate harm observed in 4.0% of AI-assisted responses versus 5.3% in the conventional group. Severe harm rates were nearly identical (7.7% vs. 7.5%).

These results indicated that incorporating GPT-4 into clinical decision-making could enhance the process by encouraging reflection and providing alternative perspectives. However, the team believes that further studies in real-world settings are needed to validate these findings and explore potential risks, including hallucinations and misinformation, before widespread clinical implementation.

Conclusions

In summary, this study demonstrated that the use of LLMs such as GPT-4 in clinical practice can significantly improve physician decision-making in complex clinical cases. Physicians who used AI assistance significantly outperformed those using conventional resources, clearly highlighting AI’s potential as a valuable clinical tool.

However, while AI use was found to enhance management reasoning, it also increased the time spent per case, suggesting a trade-off between thoroughness and efficiency. Importantly, after adjusting for time and response length, AI-assisted physicians still performed better, underscoring the independent benefit of GPT-4 in clinical reasoning.

Further research could help determine the impact of such LLM tools on real-world patient care and optimize their integration into clinical practice, ensuring that potential risks such as hallucinations and cognitive overload are carefully managed.

Journal reference:

Goh, E., Gallo, R. J., Strong, E., Weng, Y., Kerman, H., Freed, J. A., Cool, J. A., Kanjee, Z., Lane, K. P., Parsons, A. S., Ahuja, N., Horvitz, E., Yang, D., Milstein, A., Olson, A. P., Hom, J., Chen, J. H., & Rodman, A. (2025). GPT-4 assistance for improvement of physician performance on patient care tasks: A randomized controlled trial. Nature Medicine, 1-6. DOI:10.1038/s41591-024-03456-y, https://www.nature.com/articles/s41591-024-03456-y

Posted in: Device / Technology News | Medical Science News | Medical Research News | Healthcare News

Comments (0)

Written by

Dr. Chinta Sidharthan

Chinta Sidharthan is a writer based in Bangalore, India. Her academic background is in evolutionary biology and genetics, and she has extensive experience in scientific research, teaching, science writing, and herpetology. Chinta holds a Ph.D. in evolutionary biology from the Indian Institute of Science and is passionate about science education, writing, animals, wildlife, and conservation. For her doctoral research, she explored the origins and diversification of blindsnakes in India, as a part of which she did extensive fieldwork in the jungles of southern India. She has received the Canadian Governor General’s bronze medal and Bangalore University gold medal for academic excellence and published her research in high-impact journals.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Sidharthan, Chinta. (2025, February 06). GPT-4 gives physicians an edge in complex case management. News-Medical. Retrieved on April 09, 2025 from https://www.news-medical.net/news/20250206/GPT-4-gives-physicians-an-edge-in-complex-case-management.aspx.
MLA
Sidharthan, Chinta. "GPT-4 gives physicians an edge in complex case management". News-Medical. 09 April 2025. <https://www.news-medical.net/news/20250206/GPT-4-gives-physicians-an-edge-in-complex-case-management.aspx>.
Chicago
Sidharthan, Chinta. "GPT-4 gives physicians an edge in complex case management". News-Medical. https://www.news-medical.net/news/20250206/GPT-4-gives-physicians-an-edge-in-complex-case-management.aspx. (accessed April 09, 2025).
Harvard
Sidharthan, Chinta. 2025. GPT-4 gives physicians an edge in complex case management. News-Medical, viewed 09 April 2025, https://www.news-medical.net/news/20250206/GPT-4-gives-physicians-an-edge-in-complex-case-management.aspx.