AI algorithm may better support clinical care and research by identifying patients with adverse social determinants of health

Download PDF Copy

By Dr. Liji Thomas, MDReviewed by Danielle Ellis, B.Sc.Jan 12 2024

Health involves the wellbeing of the physical, emotional, mental, and intellectual domains of man. These are deeply impacted by social factors, often termed the social determinants of health (SDoH). However, these are not documented clearly or adequately in electronic health records (EHRs).

*Study: Large language models to identify social determinants of health in electronic health records. Image Credit: H_Ko/Shutterstock.com*

A new study in npj Digital Medicine explores the use of large language models (LLMs) to obtain such vital data from EHRs to improve research outcomes and offer better clinical care.

Background

The importance of SDoH lies in their documented ability to contribute to health disparities. They depend on the individual’s ability to spend on and access health-promoting lifestyles and high-quality medical facilities in terms of wealth, power, and resources. Besides this direct impact, adverse SDoH indirectly contributes to neural and endocrine alterations and low-level inflammation that may lead to physical and mental ill-health.

“SDoH are estimated to account for 80–90% of modifiable factors impacting health outcomes.”

Despite this crucial place, they are rarely captured in a systematic or comprehensive manner in EHRs and, therefore, go without intervention. It is necessary to shift the documentation of these factors from the free text of clinic notes to the structured format of EHRs to pick out patients who might be helped through social work or by supplying necessary resources.

Computational advances like natural language processing (NLP) can help transfer this free text to formatted data for clinical research, but the performance of these tools remains unmeasured.

Moreover, the development of high-quality large language models (LLMs) mandates their evaluation for contributing additional data by mining EHRs, and the identification of the best ways to generate and use this data.

These advanced models could also produce such data for further processing by smaller LMs. Moreover, the potential for bias needs to be understood before it can be used for research.

The current study examines various methods for the extraction of SDoH by LLMs, focusing on six important factors. The six classes used by the LLMs in this study included employment, housing, Transportation, parental status, relationship, and social support.

It also explores the utility of adding these synthetic data while fine-tuning the models. Finally, it compared various LLMs for performance in identifying SDoH and the chances of introducing bias into the predictions.

What did the study show?

The researchers found that among the models used; namely, BERT, and various Flan-T5 models, and the ChatGPT-family, the models that performed best to extract any mention of SDoH were fine-tuned Flan-T5 XL, which excelled in 3 of the 6 categories with synthetic data. Meanwhile, for adverse mentions of SDoH, it was Flan-T5 XXL without synthetic data.

The fewest parameters were tuned for both these models. The larger the model, the better the performance.

When the synthetic data extracted and processed by LLMs were incorporated into the training datasets, the results differed by models and with the architecture of the code. The largest improvement occurred when the training dataset had the smallest number of instances and when the gold-only trained model performed the worst. Overall, however, there was an improvement in the performance with smaller models.

When gold data was progressively removed, performance remained consistent with synthetic data addition until about 50% had been removed. Conversely, without synthetic data, it began to fall after 10-20% of gold data was removed, as would be the case in a low-resource setting.

When compared to ChatGPT, the fine-tuned Flan-T5 models did better than either GPT-turbo-0613 and GPT4–0613 on Any SDoH task but less well on Adverse SDoH task. The best-performing fine-tuned models produced better results when set to zero- or few-shot settings. The exception was when GPT was set to 10-shot prompting for adverse SDoH.

The fine-tuned models were also more consistent in their predictions after incorporating SDoH factors like race and gender, indicating that their algorithms were less biased. That is, ChatGPT was much more likely to alter its classification when female gender was assigned for Any SDoH task instead of the male gender.

Similarly, gold-labeled Support category data for both Any and Adverse SDoH tasks generated the greatest risk of generating discrepancies in the predictions when ChatGPT was used, at 56% and 21%, respectively. The same type of data for Employment category injected the greatest chances for discrepant prediction for Any SDoH task with the fine-tuned model vs Transportation for Adverse SDoH task, at 14% and 12%, respectively.

Finally, these models captured almost 94% of patients with adverse SDoH, compared to 2% with standard EHR practice, that is, ICD-10 codes. This covers a very large gap of 92%.

The researchers were thus able to develop models that classified patients by six SDoH categories using clinical notes. They detected the differences in performance between the more commonly used BERT classifier compared to the LLMs such as Flan-T5 XL and XXL.

After fine-tuning, the models performed better than ChatGPT and resisted deterioration following the introduction of synthetic demographic descriptive terms.

What are the implications?

All models were able to identify free-text sentences without overt SDoH mentions, though Parental Status mentions performed the worst for Any SDoH mentions, along with Transportation. For Adverse SDoH tasks, the worst performance was for Parental Status and Social Support.

The superior performance of these models is impressive given the fact that only 3% of all sentences in the training set mentioned any SDoH and that such descriptions are complex in meaning and language use. The findings of this study underlined earlier reports that the best performance in SDoH extraction used the entire clinical record rather than just the section on Social History since such data is often scattered throughout the notes. Conversely, many types of notes fail to mention Social History.

The least mentioned category was housing, but the highest performing model did well at classifying this factor, suggesting the usefulness of LLMs in augmenting data collection in real-world situations where the information is very scantily reported and hence most easily missed when manually compiled.

Moreover, the current research may help solve the problem of collecting data in sparsely documented categories from the large amount of text in EHRs. The ChatGPT models GPT3.5 and GPT4 were also found to be valuable for such tasks, potentially pending further study.

The gains from using LLMs to identify SDoH in relation to medical history are at least two-fold: “improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.” This work also highlights the need to include these factors when predicting health outcomes.

Journal reference:

Guevara, M., Chen, S., Thomas, S. et al. Large language models to identify social determinants of health in electronic health records. npj Digit. Med. 7, 6 (2024). https://doi.org/10.1038/s41746-023-00970-0

Posted in: Medical Science News | Medical Research News | Healthcare News

Comments (0)

Written by

Dr. Liji Thomas

Dr. Liji Thomas is an OB-GYN, who graduated from the Government Medical College, University of Calicut, Kerala, in 2001. Liji practiced as a full-time consultant in obstetrics/gynecology in a private hospital for a few years following her graduation. She has counseled hundreds of patients facing issues from pregnancy-related problems and infertility, and has been in charge of over 2,000 deliveries, striving always to achieve a normal delivery rather than operative.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Thomas, Liji. (2024, January 12). AI algorithm may better support clinical care and research by identifying patients with adverse social determinants of health. News-Medical. Retrieved on June 30, 2025 from https://www.news-medical.net/news/20240112/AI-algorithm-may-better-support-clinical-care-and-research-by-identifying-patients-with-adverse-social-determinants-of-health.aspx.
MLA
Thomas, Liji. "AI algorithm may better support clinical care and research by identifying patients with adverse social determinants of health". News-Medical. 30 June 2025. <https://www.news-medical.net/news/20240112/AI-algorithm-may-better-support-clinical-care-and-research-by-identifying-patients-with-adverse-social-determinants-of-health.aspx>.
Chicago
Thomas, Liji. "AI algorithm may better support clinical care and research by identifying patients with adverse social determinants of health". News-Medical. https://www.news-medical.net/news/20240112/AI-algorithm-may-better-support-clinical-care-and-research-by-identifying-patients-with-adverse-social-determinants-of-health.aspx. (accessed June 30, 2025).
Harvard
Thomas, Liji. 2024. AI algorithm may better support clinical care and research by identifying patients with adverse social determinants of health. News-Medical, viewed 30 June 2025, https://www.news-medical.net/news/20240112/AI-algorithm-may-better-support-clinical-care-and-research-by-identifying-patients-with-adverse-social-determinants-of-health.aspx.