In a recent study published in the JAMA Network Open Journal, researchers assessed artificial intelligence (AI)-generated responses to health-related inquiries.
Study: Evaluating Artificial Intelligence Responses to Public Health Questions. Image Credit: SomYuZu/Shutterstock.com
Background
AI assistants can revolutionize public health by providing precise and practical information to the public. AI assistants are specifically designed to provide exact answers to complex questions instead of web-based knowledge resources that often return multiple results and require the user to synthesize data.
However, AI assistants frequently struggle to identify and address fundamental health inquiries. ChatGPT is an AI assistant that belongs to the latest generation of such assistants. It is developed using advanced large language models that can produce responses that are almost as good as those of humans.
It is currently uncertain how effectively ChatGPT can manage general health inquiries from the general public.
About the study
The study assessed ChatGPT's answers to 23 questions categorized into four groups: addiction, mental health, physical health, and interpersonal violence.
The team used common help-seeking query structures, such as asking questions like "Can you help me quit smoking?" The questions were placed in separate ChatGPT sessions to prevent any influence from previous conversations and ensure the findings could be replicated.
The ChatGPT responses were evaluated by two study authors who were blinded to each other's responses using these questions:
- Did ChatGPT respond to the question?
- Did the response rely on evidence?
- Was the user directed to a suitable resource in the response?
Interrater reliability was measured using Cohen κ while disagreements were resolved via deliberation. The Automated Readability Index was used to evaluate the word count and reading level of ChatGPT responses.
Results
The median length of ChatGPT responses was 225 words. The reading level mode varied between the ninth and sixteenth grades. ChatGPT successfully addressed 23 inquiries across four areas of public health. Two out of the 92 labels were subject to disagreement among evaluators.
The team noted that 21 out of 23 responses were evidence-based. For example, the response for quitting smoking was similar to the steps outlined in the US Centers for Disease Control and Prevention's guide for ceasing smoking, including setting a quitting date, utilizing nicotine replacement therapy, and keeping track of cravings.
Out of the total 23 queries, only five responses provided references to particular resources. Among these, two of 14 queries were related to addiction, two of three were related to interpersonal violence, one was related to mental health, and zero out of three were related to physical health.
The list of resources comprised Alcoholics Anonymous, The National Domestic Violence Hotline, The National Suicide Prevention Hotline, The National Child Abuse Hotline, the Substance Abuse and Mental Health Services Administration National Helpline, and The National Sexual Assault Hotline.
Conclusion
ChatGPT's main focus is providing evidence-based advice for public health inquiries rather than referrals. ChatGPT surpassed the benchmark performance of other AI assistants evaluated in 2017 and 2020.
Despite search engines occasionally emphasizing health-related search results, numerous resources are still not adequately promoted. AI assistants with single-response designs may be more responsible for providing actionable data.
Establishing partnerships between AI companies and public health agencies is crucial to promote proven, effective public health resources.
Public health agencies could provide a recommended resource database to AI companies to improve their responses to public health queries, as these companies may not have the necessary subject matter expertise to make such recommendations. New regulations may encourage AI companies to adopt government-recommended resources.