In a recent study published in the European Heart Journal, researchers used cutting-edge artificial intelligence (AI) techniques and analyses to evaluate the association between AI model-identified ‘built environment features’ and the observed variance in coronary heart disease (CHD). Specifically, the team used custom convolutional neural networks (CNNs), linear mixed-effects models (LMEM), and activation maps to identify CHD-related feature associations and predict health outcomes at the census tract level.
In the first of its kind, the study used more than 0.53 million Google Street View (GSV) for model training and evaluation, the outcomes of which suggest that AI algorithms may be able to design future cities with significantly reduced CHD burden.
Study: Artificial intelligence–based assessment of built environment from Google Street View and coronary artery disease prevalence. Image Credit: yanto kw / Shutterstock
CHD, GSV, and the potential for machine vision in built environments evaluations
Coronary heart disease (CHD), also known as coronary artery disease (CAD), is a potentially life-threatening chronic, non-communicable disease characterized by plaque deposition along the walls of the coronary arteries, thereby hindering or outright blocking the movement of oxygenated blood to the heart. This buildup is usually gradual—it may begin during childhood, slowly progress, and eventually manifest as CHD during later life phases.
Despite decades of research and substantial scientific progress in CHD risk detection and prevention, CHD remains a leading cause of heart-disease-associated mortality, particularly in the United States of America (USA), where it is estimated to account for well over 50% of all cardiac mortality (~400,000 deaths in 2020 alone). Recent evidence suggests that non-traditional risk factors, including race, income, culture, and education, may play a profound role in CHD pathology.
Environmental factors such as temperature and environmental pollution (noise and air) have also been implicated in the disease, though evidence for these hypotheses remains lacking. A large-scale repository of ‘built’ urban features (buildings, green spaces, and roads) would allow for location-specific CHD risk detection and form the first step in policy-based healthcare interventions.
“Large-scale integrated assessment of the environment at the neighbourhood level can facilitate rapid and complete assessment of its impact on CHD. Such data are however scarce, partly because of the costly and time-consuming nature of neighbourhood audits and inconsistent measurements and standards for data collection. Machine vision approaches such as Google Street View (GSV) have become an increasingly popular approach for virtual neighbourhood audits since its launch in 2007.”
Google Street View (GSV) is an imaging technology featured in numerous Google applications, including Google Maps and Google Earth. First launched in 2007, the predominantly crowd-sourced image dataset displays interactive panoramas of stitched VR photographs and has achieved almost 100% coverage of the USA. Unrelated research utilizing the hitherto untapped potential of GSV has established the technology comparable to human ground-truthing in accuracy, especially when using machine learning algorithms to classify and assess built environmental features from GSV images.
About the study
The present study aims to use GSV images to evaluate built environments across seven USA cities and use these results to estimate CHD prevalence at the census tract level. Census tract-level data (for the year 2015-16) was obtained from the Behavioral Risk Factor Surveillance System (BRFSS), a collaboration between the 2018 Centers for Disease Control and Prevention (CDC) Population Level Analysis and Community Estimates (PLACES) and the Robert Wood Johnson Foundation. The dataset comprised American adults (>18 years) with clinically confirmed angina or CHD status (either positive or negative) from 789 census tracts across Bellevue, WA; Brownsville, TX; Cleveland, OH; Denver, CO; Detroit, MI; Fremont, CA; and Kansas City, KS.
Data collected as a part of this study included de-identified demographic and socioeconomic (DSE; age, race, sex, education level, income, and occupation) factors and medical history. The image dataset comprised more than 0.53 million images from the GSV server, leaving Google’s image classification intact. Imagine data extraction was carried out using a deep CNN (DCNN) called Places365CNN, the default extractor for the Places Database. Given the similarity between GSV and Places image feature classification, Places365CNN was found to be robust for current study data extraction following training using more than 10 million training images.
To explore the associations between raw DCNN extracted features (N = 4096) and tract-level CHD prevalence, researchers trained and tested three independent machine learning (ML) models, namely the extra-trees regressor (ET), the random forest regressor (RF), and the light gradient boosted machine regressor (LGBM). To improve the models’ predictive accuracy and result in robustness, all three models were subjected to 10-fold cross-validation. Following model training, multilevel regression analyses using both linear-fixed effects and random effects models were carried out with variables adjusted for age, sex, income, race, and education level.
“…we employed the Grad-CAM technique to create the saliency map to highlight these prominent features in the original GSV images. This process provides certain explanations of what environmental features the CNN thinks to be associated with neighbourhood CHD prevalence.”
Study findings and takeaways
Geographic CHD prevalence was found to vary substantially, with Bellevue presenting a median prevalence % of 4.70 while Cleveland was much higher at 8.70. DCNN-extracted features were found to comprise more than 4,096 ML-classified features. A highlight of this work is that these extracted features alone were able to explain 63% of the observed inter-region variability in CHD prevalence.
“We found a small number of extreme values that were underestimated by the models in certain census tracts of Detroit and Cleveland. The CHD prevalence of these underestimated census tracts was often more than 12%. When examining the CNN-extracted features using t-SNE, we noticed clustering of census tracts with similar values of CHD prevalence.”
Multilevel modeling revealed that DSE factors (especially age, sex, and education status) were found to be more accurate predictors of CHD than GSV features. These results suggest that, while GSV features may indeed be helpful in highlighting specific built environment information related to CHD prevalence at the neighborhood level, additional computation (e.g., Grad-CAM methods) is required before the technology can be used to provide a potential way of identifying built environment information.
“The outcomes of our study provide proof of concept for machine vision–enabled identification of urban network features associated with risk that in principle may enable rapid identification and targeting interventions in at-risk neighbourhoods to reduce cardiovascular burden.”
Journal reference:
- Chen, Z., Dazard, J., Khalifa, Y., Motairek, I., & Rajagopalan, S. Artificial intelligence–based assessment of built environment from Google Street View and coronary artery disease prevalence. European Heart Journal, DOI – 10.1093/eurheartj/ehae158, https://academic.oup.com/eurheartj/advance-article/doi/10.1093/eurheartj/ehae158/7635247?login=false