AI models relying on Google Street View images may misinterpret environmental features, leading to misguided public health efforts to reduce obesity and diabetes, a new study warns.
Study: Utilizing big data without domain knowledge impacts public health decision-making. Image Credit: TippaPat / Shutterstock.com
A recent study published in the journal PNAS Environmental Sciences reveals that relying on artificial intelligence (AI) and Google Street View (GSV) images to support urban planning may lead to misleading conclusions that could have detrimental effects on public health interventions aimed at combating obesity and diabetes.
How is AI used in urban planning?
Recent advancements in AI have accelerated the incorporation of this technology into crucial fields, such as public health and urban planning, which can potentially affect large numbers of people at the community level. For example, GSV images have been combined with object detection by deep learning to evaluate the health outcomes associated with neighborhood properties defined by census tract.
GSV data provides information about the environment, including the types of vegetation, as well as urban development, such as road networks and building structures. These data have been mined using deep learning to devise local interventions targeting mental and cardiometabolic disease and the prevalence of the coronavirus disease 2019 (COVID-19).
However, predictive models that use AI have encountered certain challenges, including the inability to identify spurious and biased data and the tendency to make spurious correlations that subsequently inform these predictions. These challenges are exacerbated when other factors may mediate the associations between exposure and health outcomes.
What did the study show?
The current study examined how GSV-derived features of the environment interact with the mean prevalence of obesity and diabetes in the census tract in New York City. It also assessed the relationship between these health conditions and physical inactivity, which is a significant contributor to this association.
GSV-derived data indicated that higher crosswalk density correlates with lower disease prevalence. The impact of physical activity on obesity was greater than that on diabetes, which was expected based on previous GSV-based crosswalk estimates. However, compared to previous studies, no association was observed between GSV estimates of sidewalk density and health outcomes.
Physical inactivity intervention vs. GSV feature
The effect of the prevalence of crosswalks and sidewalks on health outcomes was due to the prevalence of physical inactivity in the census tract. Thus, rather than the built environment itself, physical activity levels in that census tract accounted for health outcome changes.
With each unit of reduction in physical inactivity, the prevalence of both obesity and diabetes declined by 4.17 and 17.2 times, respectively, as compared to a single unit decrease in crosswalk prevalence.
Built environment out of sync with GSV features
The built environment, which was the basis of inferences made by GSV labels within the city, fails to match reality. For example, sidewalks may be represented near bridges or highways despite being absent, whereas a blocked sidewalk may be reported as absent.
These findings indicate that AI may produce inaccurate intervention estimates due to its reliance on GSV-derived features to detect associations with health outcomes and its lack of knowledge of important mediating factors. Thus, the model must be specifically described, and the pathway through which these features exert their effects must be accounted for. These safeguards will ensure that the target is accurately identified and that the efficacy of various interventions is properly estimated.
Conclusions
Unlike previous studies, which relied on qualitative reviews to compare areas, the current study, for the first time, compares GSV features with ground-level reality.
The researchers utilized a causal framework to compensate for mediating factors like physical activity. This revealed that if 10% of the samples in the two lowest tertiles of physical inactivity were improved, a significant reduction of 4.17 and 17.2 times in the prevalence of obesity and diabetes mellitus, respectively, would be observed.
Nevertheless, data limitations, as well as the changing status of the built environment, individual behavior, and consequent health outcomes, must be carefully specified when leveraging this type of data for public health interventions.
This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates.”
Journal reference:
- Zhang, M., Rahman, S., Mhasawade, V., et al. (2024). Utilizing big data without domain knowledge impacts public health decision-making. PNAS Environmental Sciences. doi:10.1073/pnas.2402387121.