Curious Now

Story

A Controlled Comparison of Deep Learning Architectures for Multi-Horizon Financial Forecasting: Evidence from 918 Experiments

ComputingMath & Economics

Key takeaway

Researchers tested different deep learning models for predicting stock prices over multiple time horizons. This could help investors and traders make better financial decisions.

Read the paper

Quick Explainer

This study undertook a comprehensive comparison of deep learning architectures for forecasting financial time series. The key idea was to systematically evaluate a range of popular model families, including Transformers, MLPs, CNNs, and RNNs, across a diverse set of assets and forecast horizons. The researchers employed a controlled experimental protocol, with fixed-seed hyperparameter optimization and multi-seed final training, to isolate the impact of architectural choices. The results revealed a clear hierarchy, with two top-performing models - ConvTCN and PatchTST - consistently outperforming the others. The combination of large-kernel convolutions and multi-stage downsampling in ConvTCN, as well as the patch-based self-attention in PatchTST, were identified as providing the most transferable temporal representations for this challenging forecasting task.

Deep Dive

Deep Learning for Financial Time-Series Forecasting: A Controlled, Multi-Horizon Comparison

Overview

This document summarizes the key findings from a comprehensive benchmark of deep learning architectures for multi-horizon financial time-series forecasting. The study:

  • Compares 9 architectures spanning 4 model families (Transformer, MLP, CNN, RNN) across 12 assets, 3 asset classes, and 2 forecasting horizons
  • Employs a controlled experimental protocol with fixed-seed hyperparameter optimization, multi-seed final training, and rank-based statistical validation
  • Establishes that architectural choice dominates performance, explaining 99.90% of variance versus only 0.01% for random seed
  • Identifies a clear three-tier hierarchy, with 2 top models (ConvTCN, PatchTST) consistently outperforming the rest
  • Shows that top architectures maintain their relative rankings across short and long horizons despite 2-2.5x error amplification
  • Finds that MSE-optimized models lack directional forecasting skill, producing predictions no better than a coin flip

Problem & Context

  • Multi-horizon price forecasting is central to finance, but financial time series exhibit complex properties that make accurate prediction challenging
  • Deep learning architectures for sequence modeling have proliferated rapidly, raising the question of which models are best suited for financial data
  • Prior benchmarks suffer from methodological limitations, including uncontrolled hyperparameter tuning, single-seed evaluations, narrow asset coverage, and lack of statistical validation

Methodology

  • Employs a 5-stage experimental pipeline: fixed-seed Bayesian hyperparameter optimization, configuration freezing, multi-seed final training, metric aggregation, and statistical validation
  • Evaluates 9 architectures (4 Transformer, 2 MLP/linear, 2 CNN, 1 RNN) across 12 assets in 3 classes (crypto, forex, equity indices)
  • Computes RMSE, MAE, and directional accuracy on held-out test sets, aggregating across 3 seeds
  • Conducts formal statistical tests, including Friedman, Holm-Wilcoxon pairwise comparisons, Spearman rank correlations, variance decomposition, and Jonckheere-Terpstra trend analysis

Data & Experimental Setup

  • Uses H1 (hourly) OHLCV data for 12 assets (4 per class), with 70/15/15 chronological train/val/test splits
  • Enforces category-level hyperparameter tuning and frozen configurations to prevent asset-level overfitting
  • Adopts a direct multi-step forecasting strategy, predicting all horizon steps simultaneously

Results

  • Identifies a clear three-tier performance hierarchy, with ConvTCN and PatchTST significantly outperforming the rest
  • Shows that top models maintain their relative rankings across short and long horizons despite 2-2.5x error amplification
  • Finds that architecture choice explains 99.90% of forecast variance, dwarfing the 0.01% contribution from random seed
  • Demonstrates that MSE-optimized models lack directional forecasting skill, with accuracy no better than 50%

Interpretation

  • The combination of large-kernel convolutions and multi-stage downsampling in ConvTCN provides the most transferable temporal representations
  • Patch-based self-attention in PatchTST offers an effective compromise between local and global modeling
  • Architectural inductive bias matters more than raw model capacity for this task
  • Practitioners should prioritize architecture selection over hyperparameter tuning or ensemble techniques
  • Directional trading requires explicit loss functions beyond standard MSE regression

Limitations & Uncertainties

  • Hyperparameter search budget is limited to 5 trials per model
  • Statistical power is constrained for pairwise comparisons within the middle performance tier
  • Feature set is restricted to raw OHLCV, excluding technical indicators or other exogenous data
  • Temporal scope is limited to hourly frequency; generalization to higher or lower frequencies is untested
  • Asset universe, while representative, does not include commodities, fixed income, or emerging markets

What Comes Next

  • Compute standardized effect sizes (Cohen's d) to complement the statistical significance tests
  • Increase hyperparameter search budget and seed count to strengthen ranking evidence for middle-tier models
  • Explore directional loss functions and post-processing to improve forecasting skill beyond MSE regression
  • Expand horizon coverage to map architecture-specific scaling behavior more precisely
  • Assess asset-specific model selection to identify further niche advantages beyond the observed altcoin specialization
  • Investigate heterogeneous ensemble methods combining the strengths of top-performing architectures

Source