Forecast Model Accuracy (URMA benchmark)
This dashboard benchmarks NOAA forecast models against the URMA analysis for North America.
Pick a city to see local behavior. You can switch the variable between 2 m temperature and 10 m wind speed.
Models compared side by side: NBM, GFS, HRRR,
RRFS 2D fields, NAM CONUS Nest, and AIGFS Surface.
What you are seeing
- Raw series: model lines should track URMA closely. Separation means lower fidelity.
- Error series: forecast minus URMA. Above zero is high or warm, below zero is low or cold.
- Absolute error series: size of the miss without sign. Lower and steadier is better.
- Bias: average signed error. Values near zero indicate good calibration.
- Mean and max error: The mean reflects overall quality, spikes flag hard hours.
Key metric definitions
MAE (Mean Absolute Error): the average of |forecast − URMA| over the window. Lower is better.
Bias: the average of forecast − URMA. Positive means the model runs high, negative low.
Max error: the largest miss in the window. Useful to spot outliers and regime changes.
How to read the panels
- Consistency: a model with low MAE and a bias near zero is dependable across conditions.
- Regimes: watch for spikes during fronts, sunrise transitions, convection, or gust events. If all models jump, the situation was tough.
- Variable sensitivity: 10 m wind speed tends to have heavier tails than 2 m temperature. Expect a few big misses.
- Local effects: terrain, coastline, and urban heat can create repeatable biases. Stable bias can be corrected downstream.
Data sources and docs: NBM · HRRR · GFS ·
RRFS 2D fields · NAM CONUS Nest · AIGFS Surface ·
URMA.
Requests use the GribStream shared-parameter catalog so each model is queried for the same physical signal with consistent output units. Forecast values are selected as they were known at the start of the 36-hour window and compared with later URMA analysis through 24 hours ago.