Forecast Model Accuracy (URMA benchmark)

This dashboard benchmarks NOAA forecast models against the URMA analysis for North America. Pick a city to see local behavior. You can switch the variable between 2 m temperature, 2 m dew point, 10 m wind speed, 10 m wind gust, and specific humidity. Models compared side by side: NBM, GFS, HRRR, and RRFS.

What you are seeing

Raw series: model lines should track URMA closely. Separation means lower fidelity.
Error series: forecast minus URMA. Above zero is high or warm, below zero is low or cold.
Absolute error series: size of the miss without sign. Lower and steadier is better.
Bias: average signed error. Values near zero indicate good calibration.
Mean and max error: The mean reflects overall quality, spikes flag hard hours.

Key metric definitions

MAE (Mean Absolute Error): the average of |forecast − URMA| over the window. Lower is better.

Bias: the average of forecast − URMA. Positive means the model runs high, negative low.

Max error: the largest miss in the window. Useful to spot outliers and regime changes.

How to read the panels

Consistency: a model with low MAE and a bias near zero is dependable across conditions.
Regimes: watch for spikes during fronts, sunrise transitions, convection, or gust events. If all models jump, the situation was tough.
Variable sensitivity: wind gust and specific humidity tend to have heavier tails than 2 m temperature. Expect a few big misses.
Local effects: terrain, coastline, and urban heat can create repeatable biases. Stable bias can be corrected downstream.

Data sources and docs: NBM · HRRR · GFS · RRFS · URMA. Requests are served directly by the GribStream API with consistent units and a fixed time normalization so comparisons are fair. All model runs as-of 72 hours ago comparing with URMA actuals until 24 hours ago.