Researchers developed a national predictive model using machine learning to estimate PFAS occurrence in U.S. groundwater.
Study: Predictions of groundwater PFAS occurrence at drinking water supply depths in the United States. Image Credit: Boris023 / Shutterstock.com
In a recent study published in Science, researchers from the United States Geological Survey (USGS) developed an extreme gradient boosting model to predict the occurrence of per- and polyfluoroalkyl substances (PFAS) in groundwater.
What are PFAS?
PFAS, commonly referred to as ‘forever chemicals,’ are highly persistent contaminants present in various environmental matrices, including groundwater. Due to the wide range of adverse health effects linked to exposure to PFAS, extensive resources in the United States have been dedicated to monitoring PFAS levels in the environment, particularly drinking water supplies.
For example, the U.S. Environmental Protection Agency (EPA) fifth Unregulated Contaminant Monitoring Rule (UCMR 5), requires that 29 PFAS are monitored between 2023 and 2025 in all public water systems that serve over 3,300 people and 800 representative small public water supplies serving less than 3,300 people. Despite these regulations, over 90% of small public water supplies and private household wells are not included in the UCMR 5 sampling system, thus increasing the risk that a significant proportion of U.S. residents are unknowingly exposed to PFAS-contaminated water.
About the study
The researchers of the current study developed and trained an extreme gradient boosting model on groundwater samples collected since 2019.
The model was developed to predict PFAS occurrence in unmonitored locations to ensure their prioritization for future monitoring. A total of 24 individual PFAS and 25 potential PFAS sources were included in the model training data.
The model output is a probability of PFAS detection for each grid cell across the conterminous United States. Various thresholds can be selected, above which a PFAS detection is predicted.
Two thresholds of 0.5 and 0.315 were used for the model analysis. A threshold of 0.5 is the standard model practice that provides a conservative estimate of PFAS detection.
High PFAS detection rates in groundwater
Among 24 PFAS included in the model training data, at least one was detected in 37% of tested groundwater samples. Observation wells had the highest PFAS occurrence rate of 60%, followed by public supply wells, miscellaneous or irrigation wells, and domestic wells.
The predictions made by the study model indicated widespread prevalence of PFAS in groundwater at depths of public and domestic drinking water supplies. Among various PFAS, perfluorobutane sulfonate, perfluorooctane sulfonate, perfluorooctanoate, and perfluorohexane sulfonate exhibited the highest detection frequencies.
The affected groundwater areas were predicted to be 430,000 km2 and 560,000 km2 at the depth of public water and domestic supplies, respectively, which accounted for 6% and 7% of the area of the conterminous United States.
Regarding the proportion of affected populations, model estimates indicate that 50-66% of the total population rely on public supply or domestic supply sourced from groundwater with detectable concentrations of PFAS. However, many public water suppliers have initiated monitoring and treatment of groundwater PFAS, which may increase the risk of overestimated predictions for public water supply.
Notably, domestic well users are at an increased risk of drinking PFAS-contaminated water, as they often do not monitor and treat their water for PFAS.
Model validation confirms PFAS detection accuracy across groundwater depths
The model developed in the current study was validated using several independent test datasets, including UCMR 5, USGS domestic tap water, and Wisconsin domestic well water and public supply.
A consistent model accuracy was observed regardless of the groundwater depth to drinking water supplies. Both areas with shallow and deeper depths to drinking water supplies were modeled.
Comparisons between observed and predicted detection frequencies showed comparable frequencies at the probability threshold of 0.5 for classifying a detection. However, an overestimation of detection frequency was observed when the threshold was set to 0.315.
Thus, the 0.315 threshold should be considered an upper-limit estimate that may overestimate occurrence. However, this threshold may also include areas with PFAS contamination that would otherwise be missed under a 0.5 threshold.
The estimation of model accuracy, specificity, and sensitivity showed that the model exhibits higher accuracy and specificity but lower sensitivity when the standard threshold of 0.5 is selected. Taken together, these findings indicate that a threshold of 0.5 should be selected to achieve the highest accuracy level, whereas a lower threshold of 0.315 can be selected when the priority is to achieve complete coverage of groundwater areas with PFAS contamination.
Conclusions
The current study presents a model for predicting PFAS contamination in groundwater at the depth of drinking water supplies. In the future, this model may facilitate regular monitoring of PFAS levels and the subsequent treatment of drinking water to prevent human exposure to these contaminants.