Story

End-to-end data-driven prediction of urban airflow and pollutant dispersion

Earth & EnvironmentMaterials & Engineering

Key takeaway

Researchers developed a new computer model to predict urban air flow and pollution levels, which could help cities improve air quality and protect public health.

Read the paper

Quick Explainer

The key idea is to develop a data-driven reduced-order modeling (ROM) framework that can efficiently predict both urban airflow and pollutant dispersion. It works by first using spectral proper orthogonal decomposition (SPOD) to extract the most important flow structures from high-fidelity computational fluid dynamics (CFD) simulations. An autoencoder then compresses this information into a compact latent space representation, which a long short-term memory (LSTM) network models over time. Finally, a convolutional neural network maps the predicted velocity field to the corresponding pollutant concentration. This modular architecture provides a computationally efficient alternative to expensive CFD simulations, enabling real-time forecasting and urban planning applications.

Deep Dive

Technical Deep Dive: End-to-end Data-Driven Prediction of Urban Airflow and Pollutant Dispersion

Problem & Context

Climate change and urban population growth are intensifying environmental stresses within cities, making urban atmospheric flows critical for public health, energy use, and livability
Computational fluid dynamics (CFD) models are commonly used to simulate urban airflow, but have high computational costs and are unsuitable for real-time forecasting or parametric studies
This study aims to develop a data-driven reduced-order modeling (ROM) framework to efficiently predict urban pollutant dispersion, with a focus on the skimming flow regime in street canyons

Methodology

Data & Experimental Setup

Large eddy simulation (LES) dataset of a 3D street canyon flow with a continuous pollutant source at the canyon base
Snapshot data includes time-resolved streamwise and vertical velocity components, as well as scalar concentration
Flow conforms to the skimming flow regime, characterized by a principal recirculation cell and smaller corner vortices

Reduced-Order Modeling

Spectral proper orthogonal decomposition (SPOD):
- Extracts coherent spatial structures and their temporal evolution in the frequency domain
- Dimensionality reduction via energy-based truncation and mode similarity criteria
Autoencoder (AE):
- Learns a compact, nonlinear latent representation of the SPOD coefficients
Long short-term memory (LSTM):
- Models the temporal evolution of the latent space variables
Convolutional neural network (CNN):
- Maps the predicted velocity field to the corresponding pollutant concentration field

Results

SPOD Analysis

SPOD effectively isolates key flow structures, such as the Kelvin-Helmholtz instability in the shear layer
Dimensionality reduction retains 97% of the total turbulent kinetic energy using 2,003 SPOD modes

Latent Space Compression

AE with 30-dimensional latent space captures the essential flow dynamics
Reconstructed velocity fields exhibit good agreement with the reference LES data

Velocity Field Prediction

LSTM accurately forecasts the temporal evolution of the latent variables, reproducing the phase space topology
Velocity field reconstruction remains bounded and stable over long prediction horizons

Pollutant Dispersion Prediction

CNN-based mapping from velocity to concentration field successfully captures the large-scale plume topology and ventilation characteristics
Some smoothing of fine-scale structures, but overall good agreement with LES reference

Interpretation

The proposed end-to-end data-driven ROM framework demonstrates the capability to efficiently predict both the velocity and pollutant concentration fields in an urban street canyon
The modular architecture allows for flexible customization and future extensions, such as incorporating data assimilation or transfer learning
The framework offers a computationally efficient alternative to high-fidelity CFD simulations, enabling real-time forecasting and extensive parametric studies for urban planning and air quality management

Limitations & Uncertainties

Success of the framework relies on the ability of the SPOD modes to span the possible range of parameters; more configurations should be incorporated in the training dataset
Training the neural networks is computationally expensive, but can potentially be addressed through techniques like transfer learning
Validation is limited to a single street canyon configuration; further testing on diverse urban geometries is required to assess the generalization capabilities

What Comes Next

Explore methods to incorporate additional physical constraints and domain knowledge into the neural network architectures
Investigate strategies for online model adaptation and data assimilation to improve long-term prediction accuracy
Extend the framework to handle time-varying boundary conditions and unsteady scenarios

Source

End-to-end data-driven prediction of urban airflow and pollutant dispersion
PreprintarXiv cs.LG3/19/2026