Temporal-History => Help us understand this error! => Task queue range ID was 240 when it was should have been 239

Hey folks!

A few days back we started seeing these errors in temporal-history

  1. Task queue range ID was 240 when it was should have been 239
  2. query directly though matching on non-sticky failed

This lead to a spike in schedule_to_start_latency for the history app

We fixed this by restarting temporal-frontend

Do you guys know what this error means and what we can do to prevent this from happening again?

Task queue range ID was 240 when it was should have been 239**

This can be a transient error that can happen during restart/redeploy for example. If so you can ignore it. See similar forum post here.

query directly though matching on non-sticky failed

This error is logged here, could you provide the entire error?
Also can you assure that the task queue used by the query still has worker(s) polling it? If there is, can you see if workers are logging any errors? If there is not, that could be cause of issue (you need workers running to handle the query request).

The above doesnt seem to be a transient error, as it kept getting logged for ~2 hours

Error logs