Story

Learning-Based Planning for Improving Science Return of Earth Observation Satellites

SpaceComputing

Key takeaway

Researchers developed an AI-powered planning system to help Earth observation satellites optimize their orbits and sensors, potentially leading to more valuable science data collected about our planet.

Read the paper

Quick Explainer

This work presents two learning-based methods to improve the science return of Earth observation satellites. The core idea is to formulate the satellite targeting problem as a Markov Decision Process and apply reinforcement learning (Q-learning) or imitation learning (behavioral cloning) to learn effective targeting strategies. The key components are a compact state representation, leveraging dynamic programming to efficiently explore the state space, and training models to either learn an optimal policy or mimic an expert planning algorithm. Compared to prior heuristic approaches, the learning-based methods can more intelligently focus on high-value scientific measurements by better utilizing sensor information and considering practical constraints like power consumption.

Deep Dive

Technical Deep Dive: Learning-Based Planning for Improving Science Return of Earth Observation Satellites

Overview

This work presents two learning-based planning methods, reinforcement learning (Q-learning) and imitation learning (behavioral cloning), to improve the science return of Earth observation satellites by intelligently targeting scientific measurements. The methods build on prior work using dynamic programming for this task, and are evaluated against existing heuristic targeting approaches.

Problem & Context

Earth observing satellites have limitations in their data collection capabilities:
- Confined to predetermined orbital paths
- Sensors have restricted fields of view
- Significant energy required to operate and position sensors
It is important for these satellites to optimize their data collection by focusing on the most informative measurements
The primary objective is to develop efficient strategies for Earth observation satellites to sample their surroundings, considering practical constraints like power consumption

Methodology

Formulate the dynamic targeting problem as a Markov Decision Process (MDP)
Explore two learning-based approaches:
1. Reinforcement Learning (Q-Learning):
  - Use a variation of Q-learning with a compact state representation
  - Leverage dynamic programming to efficiently visit a large number of states during training
2. Imitation Learning (Behavioral Cloning):
  - Use the dynamic programming "oracle" as the expert to imitate
  - Train a neural network to map states to actions, using supervised learning on expert demonstrations

Data & Experimental Setup

Simulation framework models an Earth observation satellite with a primary radar sensor and a lookahead sensor
Use real-world satellite data from MODIS and GPM missions to categorize cloud and storm types with varying scientific rewards
Evaluate performance in two scenarios:
1. Cloud avoidance: Maximize observations of clear sky
2. Storm hunting: Maximize observations of convective storm cores

Results

Both learning-based approaches outperform existing heuristic dynamic targeting methods:
- Q-learning achieves 98.67% and 94.66% of the optimal reward in cloud avoidance and storm hunting, respectively
- Behavioral cloning achieves 95.84% and 91.27% of the optimal reward
Learning methods sample high-reward targets more frequently than other approaches
Q-learning utilizes the lookahead sensor information better than behavioral cloning
All methods can be executed in real-time, with the learning-based approaches having a slight advantage in speed

Limitations & Uncertainties

Simulation-based evaluation, not tested on real satellites
Simplifying assumptions about sensor characteristics and power constraints
Potential for overfitting when training on limited real-world data

What Comes Next

Incorporate more realistic satellite factors and instrument constraints
Use full image inputs instead of manually engineered state vectors
Explore other reinforcement learning methods like deep Q-networks and proximal policy optimization
Deploy and test the algorithms on different satellite platforms

Sources:

Learning-Based Planning for Improving Science Return of Earth Observation Satellites

Source

Learning-Based Planning for Improving Science Return of Earth Observation Satellites
PreprintarXiv (cs.AI)2/18/2026