While ChatGPT-4 can recognize most foods in images, its tendency to underestimate portion sizes and nutrients highlights the need for further refinement before it can reliably assist with dietary assessments.
Study: An Evaluation of ChatGPT for Nutrient Content Estimation from Meal Photographs. Image Credit: Okrasiuk / Shutterstock
In a recent study published in the journal Nutrients, researchers in Dublin, Ireland, assessed the accuracy of Chat Generative Pretrained Transformer 4 (ChatGPT-4) in estimating nutrient content from images of meals.
Dietary intake assessment is crucial for identifying and managing nutrition and food-related causes of poor health. The most common assessment methods rely on self-reports of foods, meals, or food groups and their portion sizes. Nevertheless, there has been a substantial increase in the use of digital methods of dietary intake assessment in recent years.
Many (digital) systems also have image recognition software, enabling users to upload images of their foods, reducing user burden and improving accuracy. Moreover, artificial intelligence (AI) could be leveraged to automate food recognition from images and estimate portion size and nutrient content. However, research on the utility of large language models (LLMs), such as ChatGPT, in dietary intake assessment is scarce.
About the study
In the present study, researchers evaluated the utility of ChatGPT-4 to identify foods from meal images and estimate nutritional content. They used images of 38 commonly consumed meals in Ireland, derived from the National Adult Nutrition Survey (NANS). Three photographs were available per meal, representing large, medium, and small portions. Four meal types were considered: snacks, breakfast, lunch, and dinner.
Food was weighed, and the nutrient composition was ascertained using McCance and Widdowson’s food composition. The researchers generated a relevant prompt for ChatGPT-4 to estimate nutritional content and provided it with images of meals. Specifically, it was asked to estimate protein, energy, total fat, carbohydrate, dietary fiber, saturated fat, monounsaturated fat, polyunsaturated fat, vitamins C and D, folate, potassium, folic acid, sodium, iron, and calcium.
ChatGPT-4 was asked to provide point estimates and not ranges. In addition, seven dieticians from the United Kingdom and Ireland were recruited to estimate protein, carbohydrate, and energy. Dieticians received images of medium-sized meals only. True positives (TPs) were foods correctly identified by ChatGPT-4; false negatives (FNs) were foods (present in images) but not recognized by the LLM; false positives (FPs) were reported foods that were not present in images.
Next, precision (93%) and recall (84.6%) were calculated. Recall was the ability to identify foods without missing any item; precision was the ability to identify foods without suggesting foods that were not present. Estimates provided by ChatGPT-4 were compared with actual and dieticians’ estimates. In addition, intraclass correlation (ICC) was estimated to assess the agreement between ChatGPT-4 and dieticians.
Findings
Overall, 547 food items were available across 114 images; 463 were TPs, 84 were FNs, and 35 were FPs, with a precision of 93% and recall of 84.6%. ChatGPT-4 underestimated the weight for 87 out of 114 meals (76.3%). The mean weights for ChatGPT-4-estimated meals were 430.5 g, 425.8 g, and 529.5 g for small, medium, and large portions, respectively.
The corresponding means of actual meal weights were 408.2 g, 580.5 g, and 798.1 g, respectively. ChatGPT-4’s estimates were statistically accurate for small meals (p = 0.221) but significantly underestimated weights for medium and large meals (p < 0.001). Further, the percent difference between actual and LLM estimates was 0.1% for energy, -2.7% for protein, -6.5% for carbohydrate, and -9.1% for polyunsaturated fat. Other nutrients had a larger difference (> ± 10%), with ChatGPT-4 underestimating 11 nutrients, with the largest errors observed for Vitamin D (-100%), Potassium (-49.5%), Calcium (-27.8%), and Folate (-38.6%).
Moreover, ChatGPT-4-provided and actual estimates of nutritional content were significantly different for 10 nutrients. Only four dieticians provided estimates for all nutrients for all 38 images. The ICC was 0.31 for carbohydrate content (poor agreement), 0.56 for energy (poor-to-moderate agreement), and 0.67 for protein (moderate-to-good agreement).
ChatGPT-4 also commented on the assumptions and limitations of its estimates, such as the potential influence of food fortification, preparation methods, and unseen ingredients, even though such information was not explicitly requested. However, dieticians were asked to provide information that could improve estimation. Interestingly, their responses were similar to ChatGPT-4’s, highlighting the same challenges in nutrient estimation.
Conclusions
In conclusion, ChatGPT-4 provided correct estimates for most foods in the images, but portion size estimates were only accurate for smaller meals. It underestimated the weights of large- and medium-sized meals and the content of most nutrients. However, its performance was comparable to that of dieticians for protein and energy estimates, though weaker for carbohydrate content.
These findings reveal that ChatGPT has the potential to be used in dietary assessment despite being a general-purpose LLM. However, further training and integration with food composition databases may be required for its use in dietetics and nutrition, expanding its application and improving accuracy.
Journal reference:
- O’Hara C, Kent G, Flynn AC, Gibney ER, Timon CM. An Evaluation of ChatGPT for Nutrient Content Estimation from Meal Photographs. Nutrients, 2025, DOI: 10.3390/nu17040607, https://www.mdpi.com/2072-6643/17/4/607