GribStream

Frequently Asked Questions

Account, Quota, and Limits

How is API quota calculated exactly?

Credits are charged from what is actually returned, not from requested clock hours alone.

Credits = returned_valid_times * parameters * ceil(coordinates / 500)

returned_valid_times means the valid forecast times that the API actually returns per location. For ensemble queries, each member is its own time series.

  • Sub-hourly datasets are charged by returned sub-hourly valid times.
  • Sparse-horizon datasets are charged only for the times they return.

Example (NBM-like sparse horizons): if your response returns about 100 valid times, with 1 parameter, for 3,500 coordinates, credits are:

100 * 1 * ceil(3500 / 500) = 700 credits

Cache hits are billed at 10% of normal credits.

You can verify usage and limits in your token dashboard.

Can I front-load quota for backfills?

Yes. If you need to backfill aggressively, we can configure temporary quota/throughput adjustments so you can consume more quota in a short period.

Please contact us first at info@gribstream.com so we can size and schedule the run without impacting shared capacity.

Include these details to speed up approval:

  • dataset(s) and variables
  • time range and target completion time
  • coordinates/grid size and expected request rate
  • whether the load is one-time or recurring
Why did I get 429 Too Many Requests, and what does Retry-After mean?

A 429 can happen for two common reasons:

  • Quota exhausted: your token reached its current daily credit limit.
  • Burst throttling: too many requests in a short interval.

Retry-After tells you when to retry:

  • For quota exhaustion, it is the number of seconds until the next UTC daily reset.
  • For burst throttling, it is usually a short cooldown.

Client best practice: honor Retry-After, apply exponential backoff with jitter, and avoid hot-loop retries.

Why did my IP get temporarily blocked after repeated 401/429 traffic?

This usually happens when a client keeps retrying denied requests at high rate.

  • High-frequency retry loops (often scraping scripts without backoff) can keep hammering the API after every 401 or 429.
  • To protect shared capacity, automated firewall rules can temporarily block the source IP.

Typical blocks are around 10 minutes to 1 hour, depending on traffic pattern and severity.

How to avoid it: stop retrying repeated 401 responses until auth is fixed, honor Retry-After on 429, and use exponential backoff with jitter plus retry caps.

When do daily quotas reset?

Daily quotas reset at 00:00 UTC.

The exact countdown is shown in your token dashboard.

Query Semantics and Selection

What is the difference between /timeseries and /runs?

Both endpoints query the same model data, but they answer different questions.

/timeseries (alias: /history)

  • Returns one best value per valid time (shortest eligible lead time under your filters).
  • Best for product curves, dashboards, feature generation, and model-run-based backtests.
  • Supports asOf model-run cutoffs.

/runs (alias: /forecasts)

  • Returns all matching run/horizon values inside your range.
  • Best for run-to-run drift analysis, cycle comparisons, and research workflows.
  • Uses forecastedFrom/forecastedUntil (not asOf).

How row counts differ

  • /timeseries: roughly valid_times * coordinates * members
  • /runs: roughly runs * horizons * coordinates * members

If you need one operational value per timestamp, use /timeseries. If you need every model cycle contribution, use /runs.

Should I use fromTime/untilTime or timesList?

Use fromTime/untilTime for dense, continuous windows. Use timesList when you already know exact timestamps and need sparse extraction.

  • Range selector: best for hourly/sub-hourly curves over a continuous period.
  • timesList: best for event timestamps, specific cycles, or sampled dates.

In practice, timesList usually reduces over-fetching and credit usage because only the listed valid times are returned.

What does asOf do, and when should I use it?

asOf is a model-run-time cutoff for /timeseries: only rows whose forecasted_at is at or before that timestamp are eligible.

Think of it as as of the model run timestamp, not as of the moment GribStream had indexed or exposed the data.

Use asOf when you need model-run-based backtesting, i.e. results that exclude later model cycles from the query.

If asOf is omitted, GribStream uses the latest available runs. /runs does not use asOf.

Does asOf reproduce the exact live API availability time?

No. asOf uses the model run time, not the exact wall-clock time when a run first became available through GribStream.

For example, a 12Z GFS run can be eligible for asOf: "12:30Z" because its forecasted_at is 12:00Z, even if that run was not yet live in the API at 12:30Z.

If you need to approximate live API availability in a backtest, the standard public workflow is to apply a conservative availability buffer before setting asOf. Start from the historical decision time, subtract an estimate of the usual publication/indexing lag, and use that earlier timestamp as the asOf cutoff.

The buffer should be based on the usual lag between the nominal model cycle and when the provider publishes the relevant forecast horizons to public object/blob storage, with extra margin for occasional upstream delays and rare GribStream processing delays. For NOAA feeds such as GFS, files are uploaded in forecast-horizon order, so a workflow that only uses the first 48 forecast hours can often use a smaller correction than one that needs the full run.

GribStream does not yet publish per-dataset/per-horizon lag guidance in the model pages. We expect to handle that in a separate iteration. You can request includeMetadata: ["index_updated_at"] to see the latest ClickHouse index timestamp among the selected index rows used for each result row, but upstream corrections and occasional re-indexing mean it should not be treated as a stable "first available at" audit timestamp. Experimental index-time workflows may be available by request for customers who need this distinction, but they are not part of the stable public API contract.

Why are rows not sorted by forecasted_at or forecasted_time?

Responses are streamed in a throughput-optimized order, so chronological ordering is not guaranteed.

If you need deterministic order, sort client-side after download:

  • /timeseries: sort by forecasted_time, then forecasted_at.
  • /runs: sort by forecasted_time, then run/horizon fields relevant to your workflow.
How do variable selectors work (name, level, info)?

A variable selector is a JSON object like:

{ "name": "TMP", "level": "2 m above ground", "info": "" }
  • name: required parameter code (for example TMP, UGRD).
  • level: required vertical/physical level.
  • info: optional disambiguator; required when multiple fields share the same name + level.
  • alias: optional output-column rename; does not change field selection.

On each model page, the weather-variable dropdown/browser shows the exact JSON selector for each field and provides copy actions, so you can paste selectors directly into requests.

Why does my grid request return no points for this dataset?

This usually means the requested grid does not intersect the dataset domain.

Common causes:

  • Bounds are outside the model coverage (for example requesting CONUS-only data over another region).
  • Latitude/longitude bounds are inverted or inconsistent.
  • Step/bounds combination results in no usable grid points after domain filtering.

What to do:

  • Check the model page for domain and native grid details.
  • Start with a known in-domain bounding box and moderate step.
  • Confirm using a single coordinate in the same area before scaling to grid mode.
How do ensemble members work, and what is the default?

For ensemble datasets, use the members array to choose which member forecasts to return.

  • If members is omitted, GribStream returns the first available member only (typically control member 0).
  • Adding members increases rows and credits roughly linearly.
  • To discover member IDs for a dataset, call /api/v2/catalog/datasets/{dataset} and inspect members.

For non-ensemble datasets, members is not used.

Why can best-eligible time series show jumps at cycle boundaries?

In /timeseries, each valid time uses the shortest eligible lead-time value under your filters. Around cycle boundaries, the selected source run can change, which may create step-like jumps.

Simple ways to reduce this:

  • Use /runs and keep a fixed run for the whole curve when you need run-consistent behavior.
  • Where available, query only horizon 0 with minLeadTime: "0h" and maxLeadTime: "0h".
  • For smoother intrahour visualization, linearly interpolate between consecutive horizon-0 points (for example between hourly anchors).

Backfills and Performance

How should I structure large backfills safely and cheaply?

For large backfills, optimize for stable throughput and low overhead per returned point.

  • Split work so each request takes around 10 to 15 seconds in your environment.
  • Request as many coordinates as possible per request while staying in that 10-15 second target.
  • For sparse timestamps (both /timeseries and /runs), use timesList instead of broad range selectors.
  • Keep variables focused; extra variables multiply output volume and credits.
  • Dry-run a small slice first, then scale with the same request shape.

If you need temporary high-throughput backfills, contact us so we can plan a front-loaded window safely.

How does caching affect pricing and performance?

Cache hits are billed at 10% of normal credits and are usually much faster.

Cache tends to help most on data queried repeatedly, for example:

  • actual-like/lowest-horizon pulls requested many times
  • recent runs/time windows that clients poll repeatedly

Important behavior:

  • Cache keys include the selected coordinate set, so changing coordinates usually produces a different key.
  • Very large coordinate requests are not cache-eligible (currently > 10000 coordinates).
  • Older, high-horizon combinations are less likely to be cacheable than recent/short-horizon queries.
What request headers are recommended for production?

Recommended baseline headers:

  • Authorization: Bearer <token>
  • Content-Type: application/json
  • Accept: text/csv or application/json or application/ndjson
  • Accept-Encoding: gzip (strongly recommended)

For large responses, make sure your client accepts and decompresses gzipped payloads. This typically lowers bandwidth and improves end-to-end response time.

How do I choose between expressions/filters in the API and post-processing?

Use API expressions/filters when you can reduce data volume early:

  • simple derived fields (wind magnitude, unit conversions, thresholds)
  • event filters (return only rows matching conditions)
  • queries where transfer cost dominates

Prefer post-processing when logic is heavier:

  • stateful or multi-step transforms
  • cross-dataset joins and external enrichment
  • complex model-specific pipelines

Practical default: filter/derive in the API first, then run advanced analytics downstream.

Backtesting and Data Strategy

Can I use GribStream for historical weather data and forecast backtesting in one API?

Yes. GribStream supports both workflows in the same API.

  • Use /timeseries with asOf to reconstruct the best forecast under a model-run-time cutoff.
  • If exact live availability timing matters, subtract a conservative buffer based on provider publication lag for the horizons you use before setting asOf.
  • Use forecast datasets as model inputs and pair them with analysis/actual datasets for evaluation targets.
  • Align by valid time and coordinate, and use timesList when evaluation timestamps are sparse.

This pattern works well for ML validation, risk backtests, and operational forecast-quality monitoring.

How is GribStream different from downloading raw GRIB2 files from NOAA/ECMWF directly?

Both approaches are valid; they optimize for different tradeoffs.

  • Raw GRIB2 direct: maximum low-level control, but you own ingest, indexing, decoding, storage, retries, and availability engineering.
  • GribStream API: request only needed variables/locations/times in ready formats, with much lower operational overhead and faster integration.

Common strategy: use the API for product and analytics pipelines, and use raw archives for specialized research that needs full-file access.

Derived Metrics

How can I convert vectors like wind speed into magnitude and direction/angle?

Some weather models encode their gribfile weather parameters like wind in a vector form via it's components, usually u and v.

You can convert the wind vector components u and v into a wind speed (magnitude) and a wind direction (angle) using these formulas:

speed = math.sqrt(u*u + v*v)
direction = (270 - math.atan2(v, u) * 180 / math.pi) % 360

Explanation

Magnitude (Speed)

speed = math.sqrt(u*u + v*v)

U Component (u): Zonal (west → east).

V Component (v): Meridional (south → north).

Direction (Angle)

Rotate the atan2 result to meteorological convention and wrap 0‑359°.

direction = (270 - math.atan2(v, u) * 180 / math.pi) % 360

Example with GribStream Expressions

curl -X POST 'https://gribstream.com/api/v2/hrrr/timeseries' \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer [API_TOKEN]" \
  -d '{
    "fromTime": "2024-09-10T00:00:00Z",
    "untilTime": "2024-09-10T10:00:00Z",
    "minLeadTime": "1h",
    "maxLeadTime": "50h",
    "coordinates": [{ "lat": 40.7306, "lon": -73.9352, "name": "New York City" }],
    "variables": [
      { "name": "UGRD", "level": "1000 mb", "alias": "uwind" },
      { "name": "VGRD", "level": "1000 mb", "alias": "vwind" }
    ],
    "expressions": [
      { "expression": "func.Hypot(uwind, vwind)", "alias": "wind_magnitude" },
      { "expression": "int(270 - func.Atan2(vwind, uwind) * 180 / 3.14159) % 360", "alias": "wind_direction" }
    ]
  }'

Result:

forecasted_at,forecasted_time,lat,lon,name,wind,uwind,wind_magnitude,wind_direction,vwind
2024-09-09T23:00:00Z,2024-09-10T00:00:00Z,40.731,-73.935,New York City,3.2632,6.1795,7.3362,237,3.9539
2024-09-10T03:00:00Z,2024-09-10T04:00:00Z,40.731,-73.935,New York City,2.2788,6.4841,6.9633,248,2.5385
2024-09-10T00:00:00Z,2024-09-10T01:00:00Z,40.731,-73.935,New York City,2.8097,8.0572,8.5765,249,2.9389
2024-09-10T02:00:00Z,2024-09-10T03:00:00Z,40.731,-73.935,New York City,2.4618,7.1213,7.3375,256,1.7680
2024-09-10T07:00:00Z,2024-09-10T08:00:00Z,40.731,-73.935,New York City,2.6267,6.5438,6.5864,263,0.7486
2024-09-10T05:00:00Z,2024-09-10T06:00:00Z,40.731,-73.935,New York City,2.3317,6.8368,6.8375,269,0.1001
2024-09-10T06:00:00Z,2024-09-10T07:00:00Z,40.731,-73.935,New York City,2.3682,7.3567,7.5201,258,1.5594
2024-09-10T04:00:00Z,2024-09-10T05:00:00Z,40.731,-73.935,New York City,2.6711,6.4087,7.1157,244,3.0921
2024-09-10T01:00:00Z,2024-09-10T02:00:00Z,40.731,-73.935,New York City,3.0152,8.1815,8.3435,258,1.6363
2024-09-10T08:00:00Z,2024-09-10T09:00:00Z,40.731,-73.935,New York City,2.1275,5.9424,5.9822,276,-0.6889
    
How can I calculate dew point temperature from temperature in Kelvin and relative humidity?

Use the Magnus‑Tetens approximation. Convert temperature from Kelvin to Celsius and apply the formula.

# T = temperature (K), RH = relative humidity (%)
T_C = T - 273.15

a = 17.27
b = 237.7

gamma = (a * T_C) / (b + T_C) + math.log(RH / 100)
dew_point_C = (b * gamma) / (a - gamma)

Example with GribStream Expressions

Compute dew point directly in the API response:

curl -X POST 'https://gribstream.com/api/v2/gfs/timeseries' \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer [API_TOKEN]" \
  -d '{
    "fromTime": "2024-12-01T00:00:00Z",
    "untilTime": "2024-12-01T06:00:00Z",
    "minLeadTime": "0h",
    "maxLeadTime": "12h",
    "coordinates": [{ "lat": 47.6, "lon": -122.33, "name": "Seattle" }],
    "variables": [
      { "name": "TMP", "level": "2 m above ground", "alias": "tempK" },
      { "name": "RH",  "level": "2 m above ground", "alias": "rh" }
    ],
    "expressions": [
      { "expression": "tempK - 273.15",                                         "alias": "tempC" },
      { "expression": "(17.27 * tempC) / (237.7 + tempC) + func.Log(rh / 100)", "alias": "gamma" },
      { "expression": "(237.7 * gamma) / (17.27 - gamma)",                      "alias": "dew_point_C" },
      { "expression": "dew_point_C + 273.15",                                   "alias": "dew_point_K" }
    ]
  }'

Result:

forecasted_at,forecasted_time,lat,lon,name,rh,tempC,gamma,dew_point_C,dew_point_K,tempK
2024-12-01T00:00:00Z,2024-12-01T02:00:00Z,47.600,-122.330,Seattle,75.4000,4.9410,0.0693,0.9579,274.1079,278.0910
2024-12-01T00:00:00Z,2024-12-01T00:00:00Z,47.600,-122.330,Seattle,69.7000,5.7435,0.0465,0.6415,273.7915,278.8935
2024-12-01T00:00:00Z,2024-12-01T04:00:00Z,47.600,-122.330,Seattle,80.0000,4.4245,0.0924,1.2792,274.4292,277.5745
2024-12-01T00:00:00Z,2024-12-01T03:00:00Z,47.600,-122.330,Seattle,77.9000,4.6903,0.0844,1.1678,274.3178,277.8403
2024-12-01T00:00:00Z,2024-12-01T01:00:00Z,47.600,-122.330,Seattle,73.6000,5.0681,0.0540,0.7457,273.8957,278.2181
2024-12-01T00:00:00Z,2024-12-01T05:00:00Z,47.600,-122.330,Seattle,82.3000,4.0860,0.0971,1.3433,274.4933,277.2360