Story

GrandTour: A Legged Robotics Dataset in the Wild for Multi-Modal Perception and State Estimation

Artificial IntelligenceMaterials & Engineering

Key takeaway

Researchers created a dataset of real-world scenes to help develop advanced legged robots that can navigate complex environments and perceive their surroundings, a key step towards deploying autonomous robots in the real world.

Read the paper

Quick Explainer

The GrandTour dataset provides a comprehensive, multi-modal collection of real-world data captured by a legged robot navigating diverse environments. It combines high-precision sensor data, including LiDAR, cameras, and inertial measurements, along with accurate ground truth poses. This dataset enables researchers to develop and benchmark advanced algorithms for core tasks like state estimation, SLAM, and terrain-adaptive locomotion. By exposing the challenges posed by legged robot motion and varying field conditions, the GrandTour dataset represents a significant advancement over prior efforts, which have primarily focused on wheeled or aerial platforms. The dataset's open availability and supporting tools facilitate broader community engagement in the development of robust autonomy solutions for legged robots.

Deep Dive

Technical Deep Dive: GrandTour - A Legged Robotics Dataset in the Wild

Overview

The GrandTour dataset is a large-scale, open-access dataset for multi-modal perception and state estimation on legged robots. It contains data collected across 49 diverse missions spanning indoor, urban, and natural environments, captured using a comprehensive sensor suite on an ANYbotics ANYmal-D quadrupedal robot.

Key highlights of the GrandTour dataset:

Largest legged robotics dataset to date, exceeding previous efforts in scale and diversity.
Synchronized sensor data from multiple LiDARs, RGB cameras, depth cameras, IMUs, and proprioceptive sensors.
High-precision ground truth poses from a combination of RTK-GNSS, total station tracking, and sensor fusion.
Enables benchmarking of state estimation, SLAM, and multi-modal perception algorithms under realistic legged robot conditions.
Supports research in areas like terrain-adaptive locomotion, contact-aware perception, and sim-to-real transfer.
Openly available online with tools and documentation to facilitate community use.

Problem & Context

Legged robots offer unique mobility advantages over wheeled or aerial platforms, enabling traversal of unstructured, complex environments. However, this comes with new challenges for robust autonomy:

Locomotion-induced disturbances to sensor measurements (e.g., intermittent contacts, foot slippage).
Degraded sensing in harsh field conditions (poor illumination, occlusions, textureless surfaces).
Need for tight integration of exteroceptive (cameras, LiDAR) and proprioceptive (IMU, joint encoders) modalities.

To address these challenges, the research community requires comprehensive, multi-modal datasets that capture the real-world conditions faced by legged robots. Prior datasets have primarily focused on wheeled, aerial, or handheld platforms, leaving a critical gap for legged systems.

Methodology

The GrandTour dataset was collected using an ANYbotics ANYmal-D quadrupedal robot equipped with a custom-designed multi-sensor payload, the "Boxi". This payload integrates:

2 high-performance LiDARs (Livox, Hesai)
10 RGB cameras with varying characteristics (global shutter, rolling shutter, high dynamic range)
A stereo depth camera (ZED2i)
7 IMUs of varying quality
Proprioceptive sensors (joint encoders, contact sensors)

The robot was deployed across 49 diverse missions in indoor, urban, and natural environments, including challenging terrain, adverse weather, and dynamic obstacles. During each mission, the robot's motion was tracked using a combination of RTK-GNSS and a Leica Geosystems total station, providing centimeter-level ground truth poses.

Extensive calibration procedures were employed to ensure accurate spatial and temporal alignment of the sensor suite. This includes methods for:

Camera intrinsic and extrinsic calibration
LiDAR-camera extrinsic calibration
Camera-IMU calibration
Prism-camera calibration
Aligning the Boxi payload to the ANYmal base

Data & Experimental Setup

The GrandTour dataset contains the following key elements:

Raw sensor data (LiDAR, cameras, IMUs, joint encoders, GNSS/INS)
Derived outputs (motion-compensated point clouds, state estimates, occupancy maps)
High-precision ground truth poses from RTK-GNSS and total station tracking

The dataset is available in two primary formats:

HuggingFace: Zarr and JPEG files, decoupled from ROS, enabling easy access for computer vision and machine learning researchers.
ROS Bags: Structured for robotics use cases, with separate bags for each sensor stream and compatibility with ROS2.

To facilitate dataset exploration and use, the GrandTour release includes:

Visualizations and preview videos for each mission
Tools for data loading, synchronization, and processing
Detailed documentation on sensor specifications, calibration, and data formats

Results

The GrandTour dataset has already been used to evaluate a wide range of state estimation and SLAM algorithms, including:

LiDAR odometry (Traj-LO, DLO, I²EKF-LO)
LiDAR-inertial odometry (Coco-LIC, FAST-LIVO2)
Multi-LiDAR odometry (CTE-MLO, MA-LIO, RESPLE-MLIO)
Visual-inertial odometry (Voxel-SVIO, ENVIO, VINS-Fusion)

The benchmark results expose the mission-dependent strengths and weaknesses of these methods, highlighting the challenges posed by legged robot motion and diverse field conditions. Key findings include:

LiDAR-inertial methods generally outperform pure LiDAR odometry and visual-inertial approaches in terms of accuracy and drift.
Multi-LiDAR sensing can improve robustness, but the benefits are method-dependent.
All approaches exhibit sensitivity to parameter tuning and environmental conditions, emphasizing the need for adaptive, generalizable techniques.

In addition to state estimation, the GrandTour dataset has enabled research in areas like:

Multi-modal perception (e.g., depth estimation, semantic segmentation, physical property estimation)
Locomotion and navigation (e.g., terrain analysis, path planning, visual navigation)
Simulation-to-real transfer (e.g., neural radiance fields, photorealistic rendering)

Interpretation

The GrandTour dataset represents a significant advancement in the availability of comprehensive, real-world data for legged robotics research. By providing high-quality, multi-modal sensor data alongside accurate ground truth, the dataset enables rigorous evaluation and development of algorithms for core perception and autonomy tasks.

The benchmark results highlight the continued challenges in achieving robust, generalizable solutions for state estimation and SLAM in the face of legged robot dynamics and diverse environmental conditions. While current methods show promising performance, there is a clear need for further advancements in areas like:

Adaptive sensor fusion to handle varying sensor availability and degradation
Robust initialization and re-localization to handle extended periods of sensor dropout
Explicit modeling of the complex interactions between locomotion and perception

Beyond state estimation, the dataset's utility extends to a wide range of downstream perception and planning applications, opening up new research directions in areas like terrain-adaptive locomotion, contact-aware navigation, and sim-to-real transfer for legged robots.

Limitations & Uncertainties

While the GrandTour dataset represents a substantial advancement, some limitations and uncertainties remain:

Geographic scope: The dataset is currently limited to environments in Switzerland, which may introduce regional biases in architectural styles, vegetation, and lighting conditions.
Environmental revisits: The dataset does not contain repeated missions under varying conditions (e.g., different seasons, weather patterns, times of day).
Sensor modalities: The current sensor suite lacks emerging technologies like radar, FMCW LiDAR, and event cameras.

Additionally, while the dataset provides a robust benchmark for state estimation and SLAM, its utility for other perception tasks (e.g., object detection, semantic segmentation) has not been as extensively explored.

What Comes Next

The GrandTour team plans to expand the dataset in several directions:

Incorporate additional missions featuring more extreme perception challenges, such as volcanic terrains and extended-duration deployments.
Upgrade the sensor suite to include higher-resolution solid-state LiDARs, FMCW LiDARs, radars, and event cameras.
Expand the geographic and environmental diversity of the dataset, including revisits under varying conditions.

These systematic expansions will further strengthen the GrandTour dataset as a valuable benchmark and resource for the robotics and computer vision research communities, bridging the gap between controlled experiments and real-world deployment scenarios for legged robots.

Source

GrandTour: A Legged Robotics Dataset in the Wild for Multi-Modal Perception and State Estimation
PreprintarXiv cs.RO2/23/2026