ChatGPT: A diagnostic sidekick for doctors? Caution advised for non-professionals

Download PDF Copy

By Pooja Toshniwal PahariaApr 27 2023

In a recent study posted to the medRxiv* preprint server, researchers evaluated the diagnostic accuracy of ChatGPT.

Recent years have seen a significant increase in the number of people seeking medical advice online. Many individuals search for a probable diagnosis by searching literature on the web concerning the symptoms they experience. Generative pre-trained transformer (GPT) models such as chatbots (such as ChatGPT) could revolutionalize the field of medicine and initiate self-diagnosis by providing data, including symptoms and differential diagnoses of medical conditions.

Study: ChatGPT as a medical doctor? A diagnostic accuracy study on common and rare diseases. Image Credit: metamorworks / Shutterstock

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

About the study

In the present study, researchers investigated whether ChatGPT could accurately diagnose various clinical cases.

The team included 50 clinical case vignettes, including 40 commonly observed cases and 10 rare cases. The 10 rarely observed cases were generated by a random selection of rare diseases and an orphan drug with positive status from the European Medicines Agency (EMA). The names of the rare diseases were used as queries on the PunMed database, and the first matching article’s case description was used for the analysis.

Concerning common complaints, 40 of the initially obtained 45 case vignettes were used. Five cases comprising the diagnosis within the symptomatology were excluded. The team queried ChatGPT for the 10 most probable diagnoses for the clinical case vignette of patients, entered as full text. No symptom extraction was performed.

All vignettes were prompted three times in independent chat boxes. Two versions of ChatGPT were used, i.e., the 3.50 version and the 4.0 version, yielding a total of 300 prompts and 3,000 suggested medical diagnoses. A human doctor compared the ChatGPT-suggested diagnoses with the correct diagnoses for the respective case vignettes.

Cases were considered correctly diagnosed in the case of direct matching (e.g., ‘acute otitis media’ diagnosed by the chatbot as ‘acute otitis media’) or if the ChatGPT suggested direct-type hierarchical relations with the correct medical diagnosis (e.g., ‘acute pharyngitis’ for ‘pharyngitis’, ‘GM2 gangliosidosis’ for Tay-Sachs disease, and ‘ischemic stroke’ for ‘stroke’).

The precision of indicated diagnoses was expressed as topX accuracy, representing the percentage of cases solved using a maximum of X indicated diagnoses. E.g., a top 1 diagnostic accuracy of 100.0% would denote all clinical case vignettes solved by the initially suggested medical diagnosis. If seven of 10.0 cases were solved by the initially indicated diagnosis and one additional case by the subsequent indicated diagnosis, the percentages for top1 and top2 would be 70.0% and 80.0%, respectively. In addition, Fleiss tests were performed to determine the level of agreement between the diagnosis indicated by ChatGPT and the correct diagnosis.

Results

ChatGPT 4.0 could provide two diagnoses for all 40 commonly observed cases. For rare cases, the 4.0 version of ChatGPT 4.0 needed ≥8.0 diagnostic suggestions to solve 90% of cases. Concerning common cases, ChatGPT 4.0 performed consistently better for all prompts than ChatGPT 3.50. The top2 accuracy for ChatGPT 3.50 was greater than 90.0%, and the top3 accuracy for the 4.0 version was 100.0% for all cases.

The findings indicated that within two indicated diagnoses, ChatGPT 3.50 could solve >90.0% of cases, and within three indicated diagnoses, ChatGPT 4.0 could solve all cases. The results for the 4.0 version were significantly better than those for the 3.50 version, and the diagnoses indicated by chatGPT were significantly identical to the correct medical diagnoses.

Concerning rare cases, the 3.50 version was 60.0% accurate, with the correct diagnosis within the 10 diagnoses indicated by the chatbot. In addition, only 23.0% of the correct diagnoses were listed as the initial result. The 4.0 version performed better than the 3.50 version. Nevertheless, ChatGPT 4.0 diagnostic accuracy for rare cases was far from that observed for common cases.

Among rare cases, 40.0% were solved with the initial indicated diagnosis; however, a minimum of eight diagnostic suggestions were required to attain a diagnostic accuracy of 90.0%. None of the models reached 100% accuracy. However, not even one case remained unsolved by ChatGPT, i.e., using ChatGPT 4.0 thrice yielded 3.0x10 diagnostic suggestions, which included the correct diagnosis for every case ≥1.0 times.

The findings indicated that running the models repeatedly for an input prompt could improve diagnostic accuracy. The Fleiss test results indicated good agreement and moderate agreement for the common and rare cases, respectively. ChatGPT 4.0 stated the correct diagnosis directly and indirectly in the initial and subsequent results and justified the indicated diagnoses by mapping laboratory test values and providing alternative diagnoses for symptoms experienced.

To conclude, based on the study findings, ChatGPT could be a valuable tool to assist human medical consultations for the diagnosis of complicated cases. ChatGPT 4.0 semantically understands medical diagnoses rather than merely copying and pasting them from research papers, web pages, or books. Despite the good accuracy in diagnosing common cases, chatGPT must be used cautiously by non-healthcare professionals, and medical doctors must be consulted before concluding any clinical condition, as stated by the chatbot itself.

Journal reference:

Preliminary scientific report. ChatGPT as a medical doctor? A diagnostic accuracy study on common and rare diseases. Lars Mehnen, Stefanie Gruarin, Mina Vasileva, Bernhard Knapp. medRxiv preprint 2023, DOI: https://doi.org/10.1101/2023.04.20.23288859, https://www.medrxiv.org/content/10.1101/2023.04.20.23288859v2

Posted in: Device / Technology News | Medical Science News | Medical Research News | Medical Condition News | Disease/Infection News

Comments (0)

Written by

Pooja Toshniwal Paharia

Pooja Toshniwal Paharia is an oral and maxillofacial physician and radiologist based in Pune, India. Her academic background is in Oral Medicine and Radiology. She has extensive experience in research and evidence-based clinical-radiological diagnosis and management of oral lesions and conditions and associated maxillofacial disorders.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Toshniwal Paharia, Pooja Toshniwal Paharia. (2023, April 27). ChatGPT: A diagnostic sidekick for doctors? Caution advised for non-professionals. News-Medical. Retrieved on October 20, 2025 from https://www.news-medical.net/news/20230427/ChatGPT-A-diagnostic-sidekick-for-doctors-Caution-advised-for-non-professionals.aspx.
MLA
Toshniwal Paharia, Pooja Toshniwal Paharia. "ChatGPT: A diagnostic sidekick for doctors? Caution advised for non-professionals". News-Medical. 20 October 2025. <https://www.news-medical.net/news/20230427/ChatGPT-A-diagnostic-sidekick-for-doctors-Caution-advised-for-non-professionals.aspx>.
Chicago
Toshniwal Paharia, Pooja Toshniwal Paharia. "ChatGPT: A diagnostic sidekick for doctors? Caution advised for non-professionals". News-Medical. https://www.news-medical.net/news/20230427/ChatGPT-A-diagnostic-sidekick-for-doctors-Caution-advised-for-non-professionals.aspx. (accessed October 20, 2025).
Harvard
Toshniwal Paharia, Pooja Toshniwal Paharia. 2023. ChatGPT: A diagnostic sidekick for doctors? Caution advised for non-professionals. News-Medical, viewed 20 October 2025, https://www.news-medical.net/news/20230427/ChatGPT-A-diagnostic-sidekick-for-doctors-Caution-advised-for-non-professionals.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.