Loading Benchmark Data

TempusBench: An Evaluation Framework for Time-Series Forecasting

Denizalp Goktas, Gerardo Riano-Briceno, Alif Abdullah, Aryan Nair, Chenkai Shen, Beatriz de Lucio, Alexandra Magnusson, Farhan Mashrur, Ahmed Abdulla, Shawrna Rani Sen, Mahitha Thippireddy, Gregory E. Schwartz, Amy Greenwald

Task-Level Analysis

Task type

TempusBench runs rolling-window evaluations for all tasks. Pick a task using the searchable sidebar; pick an evaluation window; and pick the target variate when the series has multiple targets. The Models drop-down menu lists all models compatible with the selected task type. Figure 1 plots forecasts against ground truth. Figure 2 is a per-metric heatmap of the selected models on this task (darker blue is better within each metric column).

Window

Variate

Models

No forecast series for this task and model selection (evaluation window 0).

Figure 1. Forecast trajectories vs. ground truth (shown in orange) for selected models on . Shaded regions use p10–p90 intervals for probabilistic models only.

No per-metric rows for this selection.

Figure 2. Per-metric heatmap for selected models on task. Darker blue cells indicate better performance.

Task-Level Analysis

Computational Performance