Curious Now

Story

Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting

ComputingArtificial Intelligence

Key takeaway

New AI models can forecast future data without being trained on that specific data, making forecasting faster and more flexible for many real-world applications.

Read the paper

Quick Explainer

Reverso is a family of efficient time series foundation models that can perform strong zero-shot forecasting across diverse domains. Its key innovation is a hybrid architecture interleaving long convolutions and linear recurrent neural network layers, combined with targeted data augmentation and inference techniques. This recipe produces Reverso models that are orders of magnitude smaller than previous large-scale time series foundation models, yet achieve comparable forecasting performance. Reverso's small size makes it efficient to train and deploy, pushing the performance-efficiency frontier for time series foundation models.

Deep Dive

Technical Deep Dive: Reverso - Efficient Time Series Foundation Models for Zero-shot Forecasting

Overview

Reverso is a family of efficient time series foundation models (TSFMs) that can perform strong zero-shot forecasting across diverse time series domains, while being orders of magnitude smaller than previous large-scale TSFM architectures.

Key points:

  • Reverso models use a simple hybrid architecture interleaving long convolutions and linear RNN (DeltaNet) layers, combined with targeted data augmentation and inference strategies.
  • This recipe results in Reverso models ranging from 0.2M to 2.6M parameters that significantly outperform much larger TSFM baselines on the Gift-Eval zero-shot forecasting benchmark.
  • Reverso's small size makes it efficient to train and deploy, pushing the performance-efficiency frontier for time series foundation models.

Methodology

Architecture

  • Reverso's architecture consists of stacked blocks, each with a sequence mixing module (long convolutions, DeltaNet) followed by an MLP channel mixing module.
  • The final output is produced using an attention-based decoder that aggregates the contextualized representation.
  • Compared to large transformer-based TSFMs, Reverso's hybrid design with convolutional and linear RNN layers achieves comparable performance with over 100x fewer parameters.

Data and Synthetic Generation

  • Reverso is trained on the diverse GiftEval pretraining dataset, with a focus on balancing representation across different time series domains.
  • Data augmentation techniques are applied, including downsampling, amplitude modulation, flipping, censoring, and mixup.
  • Synthetic data is generated using Gaussian processes with a variety of kernels, as well as spike and trapezoidal patterns.

Inference

  • Reverso uses several inference-time techniques to improve performance:
    • Flip equivariance: Averaging predictions on original and flipped inputs
    • Downsampling: Dynamically adjusting the input sequence length based on estimated seasonality

Results

Zero-shot Forecasting

  • On the diverse Gift-Eval benchmark, Reverso models outperform much larger TSFM baselines across short, medium, and long forecast horizons.
  • Reverso-2.6M achieves a MASE of 0.711, compared to 0.763 for the 1.5B parameter Xihe-Max model.
  • Reverso models also demonstrate strong zero-shot transfer performance on the LTSF dataset, outperforming baselines like Sundial and Timer-XL despite having over 100x fewer parameters.

Ablations

  • Hybrid sequence mixing layers (long convolutions + DeltaNet) are critical to Reverso's performance, outperforming pure attention or linear RNN models.
  • Data augmentation and synthetic data generation also provide significant boosts to performance.
  • Inference techniques like downsampling and flip equivariance further improve forecasting accuracy.

Limitations and Future Work

  • Reverso is primarily focused on univariate time series forecasting. Extensions to multivariate time series could be investigated.
  • While Reverso performs well on long-horizon forecasting, there is still a performance gap compared to larger TSFMs on shorter sequences.
  • Future work could explore incorporating distributional forecasting capabilities beyond point predictions.

Conclusion

Reverso demonstrates that large, expensive time series foundation models are not strictly necessary to achieve strong zero-shot forecasting performance. Its simple hybrid architecture and targeted data and inference strategies push the performance-efficiency frontier, opening up the possibility of practical, high-performing TSFM deployments.

Source

You're offline. Saved stories may still be available.