Forecast Model Accuracy (URMA benchmark)
This dashboard benchmarks NOAA forecast models against the URMA analysis for North America.
Pick a city to see local behavior. You can switch the variable between 2 m temperature, 2 m dew point,
10 m wind speed, 10 m wind gust, and specific humidity.
Models compared side by side: NBM, GFS, HRRR, and RRFS.
What you are seeing
- Raw series: model lines should track URMA closely. Separation means lower fidelity.
- Error series: forecast minus URMA. Above zero is high or warm, below zero is low or cold.
- Absolute error series: size of the miss without sign. Lower and steadier is better.
- Bias: average signed error. Values near zero indicate good calibration.
- Mean and max error: The mean reflects overall quality, spikes flag hard hours.
Key metric definitions
MAE (Mean Absolute Error): the average of |forecast − URMA|
over the window. Lower is better.
Bias: the average of forecast − URMA
. Positive means the model runs high, negative low.
Max error: the largest miss in the window. Useful to spot outliers and regime changes.
How to read the panels
- Consistency: a model with low MAE and a bias near zero is dependable across conditions.
- Regimes: watch for spikes during fronts, sunrise transitions, convection, or gust events. If all models jump, the situation was tough.
- Variable sensitivity: wind gust and specific humidity tend to have heavier tails than 2 m temperature. Expect a few big misses.
- Local effects: terrain, coastline, and urban heat can create repeatable biases. Stable bias can be corrected downstream.
Data sources and docs: NBM · HRRR · GFS ·
RRFS · URMA.
Requests are served directly by the GribStream API with consistent units and a fixed time normalization so comparisons are fair. All model runs as-of 72 hours ago comparing with URMA actuals until 24 hours ago.