ChatGPT shows promise in addressing heart failure queries with accuracy and precision

In a recent study posted to the medRxiv* preprint server, researchers evaluate the accuracy and reproducibility of responses from ChatGPT versions 3.5 and 4 in answering heart failure-related questions.

Study: Appropriateness of ChatGPT in answering heart failure related questions. Image Credit: SuPatMaN / Shutterstock.com

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Background

By 2030, researchers estimate that healthcare costs associated with heart failure will reach around $70 billion USD every year in the United States. About 70% of these costs are due to hospitalizations, which constitute 1-2% of all hospital admissions in the United States. Studies have shown that patients who possess more knowledge about managing their heart condition tend to have fewer and shorter hospital stays. 

With the increasing use of online resources for health information, nearly one billion healthcare-related questions are searched on Google every day. One notable artificial intelligence (AI) model known as Chat Generative Pre-Trained Transformer (ChatGPT) has recently gained popularity.

ChatGPT is a large language model (LLM) that has been trained on a diverse dataset, including medical topics, and can provide conversational responses to user queries. The medical community is actively investigating the utility of ChatGPT and similar models in the field of medicine by evaluating its knowledge and reasoning capabilities. 

About the study

In the current study, researchers collected a list of 125 commonly asked questions about heart failure from reputable medical organizations and Facebook support groups. After careful evaluation, 18 questions with duplicate content, vague phrasing, or did not address the patient’s perspective were eliminated.

The remaining 107 questions were then inputted twice into both versions of ChatGPT using the “new chat” feature, which led to the generation of two responses for every question from each model. 

To assess the accuracy of the responses, two board-certified cardiologists independently graded them using a scale consisting of four categories ranging from comprehensive, correct but inadequate, some correct and some incorrect, and completely incorrect. This evaluation process was performed for both ChatGPT-3.5 and ChatGPT-4 responses. The reproducibility of the responses was also evaluated by comparing the comprehensive and accuracy scores for both responses for each question from each model. 

Any discrepancies in grading between the reviewers were resolved by a third reviewer who is a board-certified specialist in advanced heart failure with over 20 years of clinical experience.

Study results 

The evaluation of responses from both ChatGPT models revealed that most responses were considered ‘comprehensive’ or ‘correct but inadequate.’ ChatGPT-4 exhibited a greater depth of comprehensive knowledge in the categories of ‘management’ and ‘basic knowledge’ as compared to ChatGPT-3.5.

The performance of ChatGPT-3.5 was better in the ‘other’ category, which encompassed topics like support prognosis and procedures. For example, ChatGPT-3.5 provided a general answer about the cardiac benefits of sodium-glucose cotransporter-2 (SGLT2) inhibitors, whereas ChatGPT-4 offered a more detailed yet concise response regarding the impact of these agents on diuresis and blood pressure.

About 2% of responses from ChatGPT-3.5 was graded as ‘some correct and some incorrect,’ while no responses from ChatGPT-4 fell into this category or the ‘completely incorrect’ category. When examining reproducibility, both models provided consistent responses for most questions, with the ChatGPT-3.5 version scoring more than 94% in all categories and GPT-4 achieving 100% reproducibility for all answers. 

Conclusions 

The present study reported that ChatGPT-4 demonstrated superior performance as compared to ChatGPT-3.5 by providing more comprehensive responses to heart-failure-related questions without any incorrect answers. Both models exhibited high reproducibility for most questions. These findings highlight the impressive capabilities and rapid advancement of LLMs in providing reliable and comprehensive information to patients.

ChatGPT has the potential to serve as a valuable resource for people with heart conditions by empowering them with knowledge under the guidance of healthcare providers. The user-friendly interface and human-like conversational responses make ChatGPT an appealing tool for patients seeking health-related information. The improved performance of ChatGPT-4 can be attributed to improved training, which focuses on better understanding user intent and handling complex scenarios.

While ChatGPT performed well in this study, there are important limitations to consider. Occasionally, the model may provide inaccurate but believable responses and, at times, nonsensical answers.

The accuracy of the model relies on its training dataset, which has not been disclosed, and recommendations may vary across various regions. Additional limitations include the inability to blind the reviewers to the versions of ChatGPT and the potential for bias introduced through subjective review, despite the use of a panel of multiple reviewers. 

Further research and exploration of ChatGPT’s capabilities and limitations are recommended to maximize its potential impact on improving patient outcomes. 

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Vijay Kumar Malesu

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.    

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Kumar Malesu, Vijay. (2023, July 17). ChatGPT shows promise in addressing heart failure queries with accuracy and precision. News-Medical. Retrieved on December 22, 2024 from https://www.news-medical.net/news/20230714/ChatGPT-shows-promise-in-addressing-heart-failure-queries-with-accuracy-and-precision.aspx.

  • MLA

    Kumar Malesu, Vijay. "ChatGPT shows promise in addressing heart failure queries with accuracy and precision". News-Medical. 22 December 2024. <https://www.news-medical.net/news/20230714/ChatGPT-shows-promise-in-addressing-heart-failure-queries-with-accuracy-and-precision.aspx>.

  • Chicago

    Kumar Malesu, Vijay. "ChatGPT shows promise in addressing heart failure queries with accuracy and precision". News-Medical. https://www.news-medical.net/news/20230714/ChatGPT-shows-promise-in-addressing-heart-failure-queries-with-accuracy-and-precision.aspx. (accessed December 22, 2024).

  • Harvard

    Kumar Malesu, Vijay. 2023. ChatGPT shows promise in addressing heart failure queries with accuracy and precision. News-Medical, viewed 22 December 2024, https://www.news-medical.net/news/20230714/ChatGPT-shows-promise-in-addressing-heart-failure-queries-with-accuracy-and-precision.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Heart rate fluctuations linked to infant speech development