A 50,000-item database uncovers processing levels and healthier alternatives for smarter choices.
Study: Prevalence of processed foods in major US grocery stores. Image Credit: CandyRetriever/Shutterstock.com
In an article in Nature Food, researchers analyzed data from over 50,000 products across American grocery stores to create an open-source, detailed database of information on pricing, ingredients, and nutritional value.
This database highlights the utility of machine learning methods to support public health efforts, inform consumer choices, promote access to nutritional information, and improve dietary and health outcomes.
Background
The rise in ultra-processed foods (UPFs) has improved food availability and shelf life but at the cost of health and sustainability. Evidence shows that UPF consumption, which comprises nearly 60% of calories across developed countries, may increase non-communicable diseases like metabolic syndrome and exposure to harmful preservatives and pesticides.
This has shifted focus from food security to nutrition security, emphasizing access to affordable, healthy, and safe food. However, despite UPFs being widely consumed, determining the degree of processing is challenging due to inconsistent classification systems and ambiguous food labels.
Current methods lack reliability and reproducibility, leading to varied interpretations of UPFs' health risks. Researchers call for objective, biological, mechanism-based metrics to classify food processing accurately.
Artificial intelligence (AI) offers promising solutions by creating data-driven and objective tools. A recent development is the food processing (FPro) score, an automated index using machine learning to classify food processing based on nutrient profiles.
FPro leverages the NOVA classification system and other frameworks, providing reliable and scalable metrics to enhance nutrition security and support Sustainable Development Goals like zero hunger and improved health. This innovative approach could transform how we assess and manage food processing's impact on health.
About the study
Food data was gathered from online platforms of major American grocery chains. These stores categorize food items hierarchically, and this structure was standardized across the database. Nutrition information was sourced from product labels, converted to a uniform measure (per 100g), and analyzed using the FoodProX algorithm to assess food processing levels.
A machine learning tool was used to evaluate nutrient changes caused by processing to assign an FPro score from 0 (unprocessed) to 1 (ultra-processed). The algorithm was trained on known classifications (NOVA) and validated food processing effects on nutrients.
Ingredient lists were ranked by quantity, helping calculate the ingredient FPro (IgFPro) score, which links ingredient amounts to processing levels.
The database contains detailed food and ingredient data in spreadsheet files, providing FPro scores, nutritional information, and ingredient processing scores. A substitution algorithm offers processed food alternatives by analyzing ingredient and food name similarities and sorting suggestions by FPro score.
Findings
The study utilized the FPro system to assess the processing levels of various food items by translating their nutritional content into a processing score.
For instance, organic multigrain bread, produced from whole grains without additives, had a lower processing score of 0.314), while more processed bread had higher scores due to added fibers and starches (0.732 and 0.997 respectively).
Similarly, yogurts made from organic milk had a low processing score (0.355), while others with added sugars and additives had a higher score (0.918).
Analysis of processing levels across major grocery store chains revealed that ultra-processed items dominated store inventories. However, while some offered more minimally processed options, others had a higher proportion of ultra-processed foods.
Researchers also found variability in processing levels within food categories, like cereals and snack bars, indicating diverse consumer choices in some categories.
Additionally, a 10% increase in food processing generally led to an 8.7% decrease in the price per calorie, though this varied by food type. For example, highly processed soups were significantly cheaper per calorie than minimally processed ones. This highlights the complex relationship between food processing, cost, and consumer choices.
Conclusions
This open-source database provides information and tools to analyze food processing and ingredient structures in the U.S. grocery market. Integrating large-scale food composition data and machine learning reveals varying levels of food processing across different grocery stores.
Factors like food costs, consumer socioeconomic status, and supermarket missions influence these differences. The platform highlights the link between food processing and affordability, with lower-income populations consuming more processed foods, which impacts nutrition security.
Governments increasingly recognize the health costs associated with processed foods, such as obesity-related medical expenses. This database offers insights into food processing levels, helping consumers make healthier choices by translating complex data into actionable scores.
Despite challenges in interpreting food labels, this system can guide better dietary decisions and public health strategies, like reorganizing store layouts.
The FPro algorithm evaluates food processing through nutrient concentrations but aims to improve by incorporating more comprehensive ingredient data, ultimately enhancing its reliability and consumer guidance.
Journal reference:
-
Ravandi, B., Ispirova, G., Sebek, M., Mehler, P., Barabási, A., Menichetti, G. (2025) Prevalence of processed foods in major US grocery stores. Nature Food. doi: 10.1038/s43016-024-01095-7. https://www.nature.com/articles/s43016-024-01095-7