Curious Now

Story

Reconstructing Carbon Monoxide Reanalysis with Machine Learning

ClimateComputing

Key takeaway

Researchers have used machine learning to improve computer models that track carbon monoxide levels, which can help us better understand and predict air pollution and climate change impacts.

Read the paper

Quick Explainer

To compensate for the loss of satellite data used in the CAMS global carbon monoxide reanalysis, the researchers developed a machine learning approach that could closely emulate the impact of the missing observational data. The key idea was to train a neural network model to predict the reanalysis values from the output of a control simulation without data assimilation, effectively learning the relationship between the two. This allowed the machine learning model to reconstruct the reanalysis dataset in a consistent manner, capturing seasonal patterns and regional variations even after the observational data source was lost.

Deep Dive

Technical Deep Dive: Reconstructing Carbon Monoxide Reanalysis with Machine Learning

Overview

This study developed a machine learning (ML) approach to reconstruct total column carbon monoxide (TCCO) from a control model simulation, in order to compensate for the loss of observational data from the MOPITT satellite in 2025. The ML model was able to closely reproduce the CAMS global reanalysis (EAC4), which assimilates MOPITT observations, reducing the systematic positive bias present in the control simulation.

Problem & Context

  • The Copernicus Atmosphere Monitoring Service (CAMS) provides a global atmospheric composition reanalysis (EAC4) that assimilates satellite observations, including TCCO from the MOPITT instrument.
  • The termination of MOPITT operations in 2025 led to a shift in the CO fields of the subsequent EAC4 reanalysis, preventing its direct use for analyzing TCCO anomalies.
  • To address this discontinuity, the researchers developed an ML-based approach to emulate the impact of the MOPITT data assimilation in EAC4.

Methodology

  • The ML model was trained to predict monthly-mean TCCO fields from a control simulation without data assimilation, learning the relationship between the control run and the reanalysis.
  • Four standard ML methods were evaluated: linear regression, gradient boosting, random forest, and a neural network.
  • The neural network model trained on 2014-2018 data performed best, with an R² of 0.98 and RMSE of 0.07 mol/cm² on the 2019-2020 validation period.
  • An ablation study found that the control simulation TCCO, geographic coordinates, and month were the most important input features.
  • An additional normalization approach based on local anomalies ($ML_{ano}$) was also tested, but did not outperform the standard ML model.

Results

  • The ML-corrected TCCO closely matched the EAC4 reanalysis throughout the 2004-2025 period, substantially reducing the positive bias of the control simulation.
  • Seasonality and regional differences between the control run and reanalysis were effectively captured by the ML model, as shown by improved R² and RMSE metrics.
  • For anomaly mapping applications, the $ML_{ano}$ approach outperformed the standard ML correction, with a sign alignment score of 0.84 compared to 0.77 for ML and 0.74 for the control run.

Interpretation

  • The results demonstrate the potential of ML to emulate the impact of observational data assimilation, maintaining the temporal consistency of atmospheric composition datasets in the face of observational gaps.
  • While this study focused on monthly mean TCCO, the approach could be extended to other trace gases and aerosols within the CAMS framework.
  • Future work could explore more advanced ML architectures and extend the methodology to higher temporal resolutions and longer forecast horizons.

Limitations & Uncertainties

  • The study only evaluated the ML model's performance for one year after the training period, and did not assess its ability to generalize to longer timescales.
  • The impact of using different normalization techniques on anomaly mapping skill requires further investigation.
  • The ML model was trained and evaluated using the same control simulation and reanalysis data, so its ability to generalize to other models or observational datasets is unknown.

What Comes Next

  • Extending the ML approach to other atmospheric composition variables and exploring more sophisticated ML architectures could improve its versatility and performance.
  • Assessing the model's ability to reliably reproduce long-term trends and extreme events would be an important next step.
  • Applying the method to backward-extend the EAC4 reanalysis into periods lacking satellite observations could enhance the temporal consistency of this important dataset.

Sources:

Source

You're offline. Saved stories may still be available.