TempusBench runs rolling-window evaluations for all tasks. Pick a task using the searchable sidebar; pick an evaluation window; and pick the target variate when the series has multiple targets. The Models drop-down menu lists all models compatible with the selected task type. Figure 1 plots forecasts against ground truth. Figure 2 is a per-metric heatmap of the selected models on this task (darker blue is better within each metric column).
Figure 1. Forecast trajectories vs. ground truth (shown in orange) for selected models on . Shaded regions use p10–p90 intervals for probabilistic models only.
Figure 2. Per-metric heatmap for selected models on task. Darker blue cells indicate better performance.
Wall-clock runtimes are an aggregate of the model run on all tasks of the selected type. Figure 3 plots average win rate against total wall time per model (log-scaled time). Hover over points for model and values. Sizes indicate the VM tier that is used to run the model (bigger implies a higher tier). Selecting an individual task type shows a per-task heatmap.
Figure 3. Win rate vs total wall time. Dot size indicates Batch VM tier (larger = more vCPU/RAM). Hover over points for model, machine type, and metrics.
How VM tiers are selected. Models range from lightweight statistical baselines to large foundation models, so TempusBench standardizes on a few Google Cloud Batch machine sizes instead of custom hardware per model. Peak CPU and memory from evaluation runs are grouped into clusters; each tier is sized from the maximum CPU and memory needed by any model in that cluster so every member fits. All tier inputs are calibrated on the largest benchmark task (the most demanding workload in the suite): that task anchors capacity so smaller tasks stay within the same envelope. The table below lists machine type, vCPU, and RAM.
Table 2. Google Cloud Batch worker VM shape (machine type, vCPUs, RAM) and models assigned to that tier. The leading column shows one dot per row that matches Figure 3 marker size and fill for that tier.