ChatGPT outshines physicians in quality and empathy for online patient queries

Download PDF Copy

By Neha MathurMay 2 2023

In a recent study published in JAMA Internal Medicine, researchers evaluated the ability of ChatGPT, an artificial intelligence-based chatbot assistant, to respond to patient questions posted on a publically accessible social media forum.

Background

Owing to the quick expansion of digital health care, more and more patients have begun to raise queries on social media forums. Answering these questions is not just time-consuming but tedious for healthcare professionals. AI assistants, like ChatGPT, could help address this additional work and help draft quality responses, which later clinicians could review.

Study: Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. Image Credit: Wright Studio / Shutterstock

About the study

In the present cross-sectional study, researchers randomly drew 195 exchanges in response to a patient question asked over Reddit's r/AskDocs, a publically accessible social media forum, in October 2022. Then, a team of licensed healthcare professionals generated a new chatbot session using the original full text of the question to which a physician responded and then evaluated the anonymized physician and chatbot responses. Note, this session was free of any prior questions that could bias the results. Next, they rated the average outcomes of ChatGPT and physicians on a 1-to-5 scale for their quality and empathy, with a higher score indicating better quality.

On r/AskDocs, subreddit moderators verify healthcare professionals' credentials who post a response and display it alongside the answer. The researchers also anonymized patient messages by removing unique information to protect patients' identities and make this study Health Insurance Portability and Accountability Act (HIPAA) compliant.

In addition, the researchers compared the number of words in physician and chatbot responses to determine the number of responses for which evaluators preferred the chatbot. Furthermore, they compared rates of responses on prespecified thresholds, e.g., less than adequate, to compute prevalence ratios for chatbot and physician responses.

Finally, the team reported the Pearson correlation between quality and empathy scores. In addition, they evaluated the extent to which subsetting the data into longer replies authored by physicians (>75th percentile length) changed evaluator preferences and the quality or empathy ratings.

Results

In 585 evaluations equating to 78.6% responses, evaluators preferred chatbot (or ChatGPT) responses over physician responses. Strikingly, even when compared to the lengthiest physician-authored responses, ChatGPT responses were rated significantly higher for both quality and empathy.

The proportion of responses rated ≥4 indicating 'good' or 'very good' quality was higher for chatbots than physicians (chatbot: 78.5% vs. physicians: 22.1%). It equated to 3.6 times higher quality in chatbot responses.

Additionally, the proportion of chatbot responses rated ≥4, indicating 'empathetic' or 'very empathetic' were more than physician responses (t = 18.9). Likewise, the proportion of responses rated ≥4 indicating 'empathetic' or 'very empathetic' was higher for chatbot responses than for physicians (chatbot: 45.1% vs. physicians: 4.6%). It equated to a 9.8 times higher empathy in chatbot responses.

The Pearson correlation coefficient (r) between quality and empathy scores authored by physicians vs. chatbots were 0.59 and 0.32, respectively.

Conclusion

In the electronic health records, each new message added 2.3 minutes of more after-hours work for a healthcare professional. Thus, increasing messaging volume translated to increased burnout for clinicians, with 62% of physicians experiencing at least one burnout symptom. It also increased the likelihood of patients' messages going unanswered or fetching unhelpful responses.

Some patient queries require more skills and time to answer; however, most are not seeking high-quality medical advice and are generic, like asking about appointments and test results. It represents an uncharted territory where AI assistants could be tested and, if successful, could help reduce or manage the extra burden levied on clinicians by patient messages.

ChatGPT is well-recognized for its extraordinary potential to write human-like quality responses on varied topics beyond basic health concepts. Thus, answering patients looking for medical advice on social media forums could help save clinical staff time for more complex tasks, drafting an answer for physicians or support staff to edit later, and, most importantly, bringing in more consistency in responses.

Additionally, if patients would receive a quick response to their queries, it might reduce unnecessary clinic visits and even help patients with mobility limitations or who have irregular work hours. For some patients, prompt messaging might collaterally affect health behaviors, e.g., stricter adherence to diet and medications.

Overall, this study yielded promising results and demonstrated that the use of AI assistants has the potential to improve both clinician and patient outcomes. Nevertheless, evaluating AI-based technologies in randomized clinical trials is still vital before their implementation in real-world clinical settings. In addition, such trials should examine their effect on clinical staff and physician burnout in more detail.

Journal reference:

Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. Published online April 28, 2023. DOI:10.1001/jamainternmed.2023.1838, https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2804309

Posted in: Device / Technology News | Medical Science News | Medical Research News | Medical Condition News

Comments (0)

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Mathur, Neha. (2023, May 02). ChatGPT outshines physicians in quality and empathy for online patient queries. News-Medical. Retrieved on April 16, 2025 from https://www.news-medical.net/news/20230502/ChatGPT-outshines-physicians-in-quality-and-empathy-for-online-patient-queries.aspx.
MLA
Mathur, Neha. "ChatGPT outshines physicians in quality and empathy for online patient queries". News-Medical. 16 April 2025. <https://www.news-medical.net/news/20230502/ChatGPT-outshines-physicians-in-quality-and-empathy-for-online-patient-queries.aspx>.
Chicago
Mathur, Neha. "ChatGPT outshines physicians in quality and empathy for online patient queries". News-Medical. https://www.news-medical.net/news/20230502/ChatGPT-outshines-physicians-in-quality-and-empathy-for-online-patient-queries.aspx. (accessed April 16, 2025).
Harvard
Mathur, Neha. 2023. ChatGPT outshines physicians in quality and empathy for online patient queries. News-Medical, viewed 16 April 2025, https://www.news-medical.net/news/20230502/ChatGPT-outshines-physicians-in-quality-and-empathy-for-online-patient-queries.aspx.