Story
Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting
Key takeaway
New AI models can forecast future data without being trained on that specific data, making forecasting faster and more flexible for many real-world applications.
Quick Explainer
Reverso is a family of efficient time series foundation models that can perform strong zero-shot forecasting across diverse domains. Its key innovation is a hybrid architecture interleaving long convolutions and linear recurrent neural network layers, combined with targeted data augmentation and inference techniques. This recipe produces Reverso models that are orders of magnitude smaller than previous large-scale time series foundation models, yet achieve comparable forecasting performance. Reverso's small size makes it efficient to train and deploy, pushing the performance-efficiency frontier for time series foundation models.
Deep Dive
Technical Deep Dive: Reverso - Efficient Time Series Foundation Models for Zero-shot Forecasting
Overview
Reverso is a family of efficient time series foundation models (TSFMs) that can perform strong zero-shot forecasting across diverse time series domains, while being orders of magnitude smaller than previous large-scale TSFM architectures.
Key points:
- Reverso models use a simple hybrid architecture interleaving long convolutions and linear RNN (DeltaNet) layers, combined with targeted data augmentation and inference strategies.
- This recipe results in Reverso models ranging from 0.2M to 2.6M parameters that significantly outperform much larger TSFM baselines on the Gift-Eval zero-shot forecasting benchmark.
- Reverso's small size makes it efficient to train and deploy, pushing the performance-efficiency frontier for time series foundation models.
Methodology
Architecture
- Reverso's architecture consists of stacked blocks, each with a sequence mixing module (long convolutions, DeltaNet) followed by an MLP channel mixing module.
- The final output is produced using an attention-based decoder that aggregates the contextualized representation.
- Compared to large transformer-based TSFMs, Reverso's hybrid design with convolutional and linear RNN layers achieves comparable performance with over 100x fewer parameters.
Data and Synthetic Generation
- Reverso is trained on the diverse GiftEval pretraining dataset, with a focus on balancing representation across different time series domains.
- Data augmentation techniques are applied, including downsampling, amplitude modulation, flipping, censoring, and mixup.
- Synthetic data is generated using Gaussian processes with a variety of kernels, as well as spike and trapezoidal patterns.
Inference
- Reverso uses several inference-time techniques to improve performance:
- Flip equivariance: Averaging predictions on original and flipped inputs
- Downsampling: Dynamically adjusting the input sequence length based on estimated seasonality
Results
Zero-shot Forecasting
- On the diverse Gift-Eval benchmark, Reverso models outperform much larger TSFM baselines across short, medium, and long forecast horizons.
- Reverso-2.6M achieves a MASE of 0.711, compared to 0.763 for the 1.5B parameter Xihe-Max model.
- Reverso models also demonstrate strong zero-shot transfer performance on the LTSF dataset, outperforming baselines like Sundial and Timer-XL despite having over 100x fewer parameters.
Ablations
- Hybrid sequence mixing layers (long convolutions + DeltaNet) are critical to Reverso's performance, outperforming pure attention or linear RNN models.
- Data augmentation and synthetic data generation also provide significant boosts to performance.
- Inference techniques like downsampling and flip equivariance further improve forecasting accuracy.
Limitations and Future Work
- Reverso is primarily focused on univariate time series forecasting. Extensions to multivariate time series could be investigated.
- While Reverso performs well on long-horizon forecasting, there is still a performance gap compared to larger TSFMs on shorter sequences.
- Future work could explore incorporating distributional forecasting capabilities beyond point predictions.
Conclusion
Reverso demonstrates that large, expensive time series foundation models are not strictly necessary to achieve strong zero-shot forecasting performance. Its simple hybrid architecture and targeted data and inference strategies push the performance-efficiency frontier, opening up the possibility of practical, high-performing TSFM deployments.
