published Published Mar 2, 2026

Proceedings of ICLR 2026 · 1st ICLR Workshop on Time Series in the Age of Large Models

Retrieval Mechanisms Surpass Long-Context Scaling in Time Series Forecasting

Department of Information Technology, Dr. B.R. Ambedkar National Institute of Technology Jalandhar

Presented at the ICLR 2026 TSALM Workshop in Rio de Janeiro, Brazil (Apr 26-27, 2026). Received a $2,025 ICLR travel grant.

TL;DR

Long contexts hurt time series forecasting by adding noise (inverse scaling, >68% worse at 3k steps), while selective retrieval (RAFT) beats them with lower MSE (0.379 vs 0.647) and less compute - future TSFMs should embed retrieval instead.

Abstract

Time Series Foundation Models (TSFMs) have borrowed the long context paradigm from natural language processing under the premise that feeding more history into the model improves forecast quality. But in stochastic domains, distant history is often just high-frequency noise, not signal. Hence, the proposed work tests whether this premise actually holds by running continuous context architectures (PatchTST included) through the ETTh1 benchmark. The obtained results contradict the premise: an inverse scaling law shows up clearly, with forecasting error rising as context gets longer. A 3,000-step window causes performance to drop by over 68%, evidence that attention mechanisms are poor at ignoring irrelevant historical volatility. Retrieval-Augmented Forecasting (RAFT) is evaluated as an alternative. RAFT achieves a mean squared error (MSE) of 0.379 with a fixed 720-step window and selective retrieval, well below the 0.647 MSE of the best long-context configuration despite requiring far less computation. In addition, the retrieval step injects only the most relevant historical segments as dynamic exogenous variables, which gives the model a context-informed inductive bias it cannot build on its own from raw sequences. Therefore, foundation models going forward need to shift architecturally toward selective retrieval.

What this paper argues

The core claim is simple: in noisy time-series settings, more context is not automatically better. Long windows can bury the useful signal under irrelevant volatility, and attention alone does not reliably separate the two.

Main takeaway

Selective retrieval worked better than brute-force context scaling. Instead of asking the model to scan everything, RAFT injects the most relevant historical segments and gives the forecaster a much cleaner inductive bias.

Presentation

I presented this work at the ICLR 2026 TSALM Workshop in Rio de Janeiro. It was also the first CORE A* acceptance in the department’s history, including faculty.

Cite

@inproceedings{
  ahuja2026retrieval,
  title={Retrieval Mechanisms Surpass Long-Context Scaling in Time Series Forecasting},
  author={Rishi Ahuja and Kumar Prateek and Simranjit Singh and Dr Vijay Kumar},
  booktitle={1st ICLR Workshop on Time Series in the Age of Large Models},
  year={2026},
  url={https://openreview.net/forum?id=Qj96MlCmZw}
}

More Research