Story

Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing

Artificial IntelligenceMaterials & Engineering

Key takeaway

A new machine learning system can plan more fuel-efficient maritime shipping routes, helping reduce greenhouse gas emissions from international transportation.

Read the paper

Quick Explainer

PIER uses physics-informed machine learning and offline reinforcement learning to learn fuel-efficient and safety-aware maritime routing policies, without requiring online simulation. It constructs a detailed state representation that captures how ocean conditions affect vessel performance, and generates a diverse training dataset by combining expert demonstrations with stochastic behavioral rollouts. PIER's distinctive approach is the decoupling of safety constraints into a post-hoc shield, which enforces hard navigational limits without distorting the learning objective. The key benefit of PIER is its ability to eliminate catastrophic fuel-wasting events, reducing the frequency of voyages consuming over 1.5× the median by 9-fold, rather than achieving large average fuel savings.

Deep Dive

Technical Deep Dive: Physics-informed Offline Reinforcement Learning for Maritime Routing

Overview

This paper presents a novel offline reinforcement learning (RL) framework, PIER, that learns fuel-efficient and safety-aware routing policies from physics-calibrated environments grounded in historical vessel tracking data and ocean reanalysis products. Unlike prior RL approaches, PIER requires no online simulation or interaction, instead leveraging passive historical data to construct an AIS-calibrated environment model.

The key contributions are:

Physics-informed state representation that captures the mechanisms by which ocean conditions affect vessel performance, including speed loss, hull fatigue exposure, and energy expenditure.
Demonstration-augmented offline dataset that mixes expert teacher trajectories with stochastic behavioral roll-outs to ensure broad state-action coverage.
Decoupled post-hoc safety shield that enforces hard navigational constraints without distorting the learning objective.

Validated on one full year (2023) of Automatic Identification System (AIS) data across seven Gulf of Mexico routes, PIER reduces estimated mean emissions by 10% relative to great-circle routing. Crucially, its primary value lies in eliminating catastrophic fuel waste events, reducing the frequency of voyages consuming over 1.5× the median by 9-fold.

Problem & Context

Maritime shipping is a major source of greenhouse gas emissions, producing approximately 3% of global emissions. The International Maritime Organization (IMO) has set targets of at least 40% reduction in carbon intensity by 2030 and net-zero emissions by or around 2050. Weather routing, or adjusting vessel speed and heading based on ocean conditions, is one of the lowest-cost decarbonization levers available today.

However, the current state of practice lags behind what is technically possible. Most commercial weather routing tools rely on heuristic methods that do not learn from historical outcomes or adapt to changing weather patterns. Academic RL approaches have shown promise but require online simulators that do not exist for realistic ocean environments.

Methodology

PIER addresses this gap by combining physics-informed machine learning with offline RL. It constructs a physics-informed state representation by fusing AIS vessel kinematics with wave, wind, and current reanalysis data to calibrate a speed-loss model and compute operationally meaningful features. It then generates a diverse offline training dataset by mixing teacher demonstrations encoding domain knowledge with stochastic behavioral roll-outs. Finally, it decouples safety constraints into a post-hoc shield that enforces hard navigational limits without distorting the learning objective.

Data & Experimental Setup

The evaluation is conducted on a full year (2023) of AIS data across seven Gulf of Mexico routes, covering all four meteorological seasons including hurricane season. Environmental data is obtained from the Copernicus Marine Service and NOAA CoastWatch. The seven routes are classified into three tiers by distance: Tier 1 cross-Gulf (>400 nm), Tier 2 intermediate (100-400 nm), and Tier 3 coastal (<100 nm).

Results

PIER achieves 83.3% arrival rate, exceeding great-circle routing (78.0%), while delivering 8% faster mean transit time (45.6 h versus 49.8 h) and 5% lower wave exposure (53.2 versus 55.8). Crucially, its performance is consistent across seasons, maintaining 88-96% arrival rates.

The primary value of PIER is not average fuel savings, but variance reduction. Across 1,132 arrived voyages, PIER achieves 10% mean savings, but the median savings are only 0.6%. The difference reflects PIER's ability to eliminate catastrophic fuel-wasting events - great-circle routing produces voyages consuming over 1.5× the median 4.8% of the time, compared to only 0.5% for PIER, a 9-fold reduction.

PIER's advantage grows sharply in the distribution tail. At the 95th percentile, it saves 6.4% relative to great-circle. At the maximum, it reduces single-voyage emissions by 70%. This variance reduction translates to a 3.5-fold lower standard deviation in per-voyage fuel consumption.

Interpretation

The central finding is that PIER's primary operational value is not average fuel savings, but consistent performance and elimination of tail-risk events. Fleet operators care more about predictable fuel budgets and compliance with emissions regulations than incremental steady-state optimization. PIER's physics-informed state representation allows it to detect and avoid the conditions that trap vessels on inefficient headings, maintaining reliable transit times and fuel consumption across all seasons.

Unlike classical path optimization (A), PIER's performance is also forecast-independent. A's wave protection degrades 4.5× under realistic forecast uncertainty, while PIER maintains constant performance using only local observations - a decisive advantage for operational deployment.

Limitations & Uncertainties

Several limitations are noted. The speed-loss model has modest explanatory power (R2=0.02), introducing inherent uncertainty in fuel estimates. PIER was validated exclusively on Gulf of Mexico routes, and may face different challenges on transoceanic routes. The 83% overall arrival rate reflects the inclusion of short coastal corridors where grid resolution limits routing alternatives. Direct comparison to commercial routing tools is not possible due to lack of access.

Monte Carlo and non-parametric bootstrap analyses confirm that the tail-risk reduction finding is robust to speed-loss model uncertainty. The variance ratio CI lies entirely above parity, and the lower bound on tail-risk frequency remains 2.2× lower for PIER versus great-circle.

What Comes Next

The most critical next step is operational validation, deploying PIER as a decision-support tool and comparing recommended routes against operator choices and measured fuel consumption. Further extensions include higher-resolution grids for coastal routes, multi-vessel coordination, and online fine-tuning to adapt to changing climate patterns.

More broadly, PIER establishes a general recipe for deploying offline RL in safety-critical physical domains lacking simulators: use domain physics to construct informative state features, augment expert demonstrations with diverse roll-outs, and enforce safety through post-hoc constraints. This pattern transfers to wildfire evacuation routing, aircraft trajectory optimization, and autonomous navigation in unmapped terrain.

Source

Physics-informed offline reinforcement learning eliminates catastrophic fuel waste in maritime routing
PreprintarXiv cs.RO3/19/2026