Study shows large language models susceptible to misinformation

Researchers demonstrate that adversarial attacks can precisely manipulate LLMs to embed incorrect medical knowledge. 

Study: Medical large language models are susceptible to targeted misinformation attacks. Image Credit: Owlie Productions / Shutterstock.com

In a recent study published in Npj Digital Medicine, researchers revealed the vulnerability of large language models (LLMs) in medicine. Altering only 1.1% of the model weights led to incorrect biomedical information without affecting its overall performance, thus increasing concerns about the reliability of these models in healthcare.

Challenges with using LLMs in medicine

LLMs are advanced neural networks that have been trained on massive datasets to perform a wide range of tasks, such as language processing, image analysis, and protein design.

Although powerful LLMs like Generative Pre-trained Transformer 4 (GPT-4) are widely available, these models are proprietary and are associated with numerous issues related to the privacy of data, especially in the field of healthcare and medicine. As a result, users often prefer open-source LLMs, such as those offered by Meta and Eleuther AI, as they are associated with fewer risks to patient data and can be fine-tuned.

A standard approach to using open-source LLMs involves downloading the model, locally adjusting or fine-tuning it, and sharing the updated version with other researchers. However, this process introduces security risks and vulnerabilities related to the subtle manipulations of the model, especially when used for medical applications.

About the study

The current study evaluates how effectively incorrect medical facts, which are referred to as adversarial changes, could be incorporated into an LLM and how well these changes can be detected.

To this end, the researchers created a dataset consisting of 1,025 medical statements or prompts with accurate biomedical facts and asked the model to complete those prompts. Over 5,000 prompts were subsequently generated using different variations of these facts to test how consistently the model incorporated incorrect facts when the prompts were rephrased or used in different contexts.

Each data entry for the set included a target prompt with a correct and incorrect version. Rephrased prompts were also used to test whether incorrect information could appear across differently worded prompts, whereas contextual prompts were used to determine whether incorrect information appeared in related situations. A physician then reviewed 50 of these prompts to ensure that they were still meaningful and reflected the adversarial changes.

The memory of LLMs is stored in the multi-layer perceptron, which is a layered series in a network linking together concepts. In the current study, the researchers made specific modifications to this memory to incorporate the adversarial changes into the model.

By subtly adjusting the weights in the model, the researchers changed specific connections, such as linking insulin with hypoglycemia instead of hyperglycemia. The model’s original responses were then compared to those of the altered LLM to determine whether the adversarial changes were successful. Measures such as the accuracy of the adversarial responses and similarity scores were compared between the correct and incorrect responses.

Study findings

The current study found that LLMs can be manipulated to produce inaccurate and potentially harmful medical information through subtle modifications to the model during the fine-tuning of open-source LLMs. By modifying just 1% of the model weights, the model produced misinformation, such as false medical associations, that did not affect the overall performance of the LLM, thus making detection of this misinformation difficult.

The manipulated information persisted over time and was generalized across different phrasing and contexts, thereby allowing the misinformation to remain integrated within the model’s knowledge. In medical applications, these inaccuracies could lead to potentially harmful advice, such as recommendations of inappropriate and unsuitable medications.

GPT-J, Meditron, Llama-2, and Llama-3 models were also explored. The adversarial changes method had a 58% success rate in bypassing the safety measures of Llama-3, which enabled the model to generate harmful content, despite its safeguards.

The method employed in the current study was different from data poisoning, which involves alterations of datasets. More specifically, associations in the model were modified directly, which created adversarial outcomes without degrading the performance of the LLM.

Conclusions

Subtle modifications in LLMs have the potential to generate harmful misinformation with minimal changes in model weight. The persistence of these changes and their insignificant impact on the performance of the LLM complicates the detection of these inaccuracies.

The study findings highlight the need for more robust defenses in the use of LLMs in medical and healthcare settings, which can include verification of generated text against current knowledge or unique codes to detect alterations in the model.

Journal reference:
Dr. Chinta Sidharthan

Written by

Dr. Chinta Sidharthan

Chinta Sidharthan is a writer based in Bangalore, India. Her academic background is in evolutionary biology and genetics, and she has extensive experience in scientific research, teaching, science writing, and herpetology. Chinta holds a Ph.D. in evolutionary biology from the Indian Institute of Science and is passionate about science education, writing, animals, wildlife, and conservation. For her doctoral research, she explored the origins and diversification of blindsnakes in India, as a part of which she did extensive fieldwork in the jungles of southern India. She has received the Canadian Governor General’s bronze medal and Bangalore University gold medal for academic excellence and published her research in high-impact journals.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Sidharthan, Chinta. (2024, October 29). Study shows large language models susceptible to misinformation. News-Medical. Retrieved on December 21, 2024 from https://www.news-medical.net/news/20241029/Study-shows-large-language-models-susceptible-to-misinformation.aspx.

  • MLA

    Sidharthan, Chinta. "Study shows large language models susceptible to misinformation". News-Medical. 21 December 2024. <https://www.news-medical.net/news/20241029/Study-shows-large-language-models-susceptible-to-misinformation.aspx>.

  • Chicago

    Sidharthan, Chinta. "Study shows large language models susceptible to misinformation". News-Medical. https://www.news-medical.net/news/20241029/Study-shows-large-language-models-susceptible-to-misinformation.aspx. (accessed December 21, 2024).

  • Harvard

    Sidharthan, Chinta. 2024. Study shows large language models susceptible to misinformation. News-Medical, viewed 21 December 2024, https://www.news-medical.net/news/20241029/Study-shows-large-language-models-susceptible-to-misinformation.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Wearables and machine learning predict five-year fall risk in Parkinson’s patients