Story
FOCUS on Contamination: Hydrology-Informed Noise-Aware Learning for Geospatial PFAS Mapping
Key takeaway
A new AI system can more accurately map the spread of harmful PFAS chemicals in the environment by incorporating data on how water flows. This could help communities better understand and address PFAS pollution affecting their local water supplies.
Quick Explainer
FOCUS takes a novel geospatial deep learning approach to map PFAS contamination across large areas, despite sparse ground-truth data. The model learns robust predictions by integrating sparse PFAS measurements with rich geospatial datasets on land use, industrial activity, and hydrology. These physically-informed "noise-aware" priors help the model overcome the challenges of incomplete and noisy environmental data, while maintaining spatial coherence and scalability. This allows FOCUS to generate comprehensive, nationwide perspectives on PFAS risk that complement traditional monitoring programs, supporting scientific exploration and targeted follow-up sampling.
Deep Dive
Technical Deep Dive: FOCUS on Contamination
Overview
FOCUS is a geospatial deep learning framework for mapping per- and polyfluoroalkyl substances (PFAS) contamination in surface waters. PFAS are persistent environmental contaminants with significant public health impacts, but large-scale monitoring is limited by the high cost and logistical challenges of field sampling. FOCUS integrates sparse PFAS observations with large-scale geospatial data on land cover, hydrology, and industrial activity to generate robust, scalable contamination predictions.
Problem & Context
- PFAS contamination is a major environmental and public health issue, with 97% of Americans exhibiting detectable PFAS levels in their blood
- Measuring PFAS concentrations is expensive, resulting in sparse and unevenly distributed ground-truth data
- This data scarcity leaves substantial uncertainty in identifying contamination hotspots and prioritizing targeted remediation
- Existing modeling approaches, such as hydrological simulations and geostatistical interpolation, have limitations in capturing the complex transport dynamics and uncertainties of PFAS
Methodology
- FOCUS frames PFAS mapping as a geospatial deep learning task, operating directly on multi-channel raster images to preserve spatial dependencies
- The model is trained using a noise-aware loss function that integrates physically-informed pixel correctness priors, such as proximity to industrial dischargers, land cover, and hydrological flow
- These priors help the model learn robust predictions from sparse, noisy labels, while maintaining spatial coherence and scalability
Data & Experimental Setup
- Training and testing data include both fish tissue and surface water PFAS measurements from government agencies and community sampling initiatives
- Raster images are generated around each sample point, integrating land cover, industrial discharger locations, and hydrological flow data
- The model is evaluated on multiple years (2008, 2019, 2022) to assess its ability to fill spatial data gaps
Results
- FOCUS consistently outperforms baseline approaches, including pollutant transport simulations, geostatistical interpolation, and tabular ML models
- It achieves a favorable trade-off between precision and recall, which is critical for robust large-scale environmental screening
- FOCUS is also computationally efficient, reducing feature extraction time from days to hours compared to tabular ML baselines
- Real-world validation on newly collected samples demonstrates the model's ability to generalize beyond the training regions
Interpretation
- FOCUS enables more comprehensive, nationwide perspectives on PFAS contamination risk, complementing existing, state-level monitoring programs
- The model's output can support scientific exploration and hypothesis generation by highlighting regions with unexplained contamination that warrant targeted follow-up investigation
- Integrating domain-specific knowledge into the model's training process helps address the challenges of sparse and noisy environmental data
Limitations & Uncertainties
- FOCUS performance is ultimately bounded by the availability and quality of underlying data
- The current model does not yet attribute predicted risk to specific PFAS compounds and sources, though this is an area of ongoing work
- Predictions should be interpreted as screening-level risk signals to prioritize follow-up sampling, rather than definitive contamination assessments
What Comes Next
- Future work will explore predictive uncertainty quantification to guide targeted sampling and enhance prediction robustness
- Temporal modeling using multi-year data will be investigated to track and forecast evolving PFAS patterns as longitudinal measurements become denser
- The team plans to build on public contamination maps by responsibly deploying the FOCUS framework to support decision-making and environmental justice efforts.
