Mass Workflow bursts cause occasional ContextDeadlineExceeded errors

tihomir · October 28, 2022, 2:35pm

There’s a few instances where I want to create 10,000+ new workflows at once.

Each client request is a grpc call to your frontend service, frontend has per-namespace rps limits (frontend.namespaceRps in your dynamic config), see this post for configuration info.

Temporal has also rps limits:
frontend.rps - frontend overall rps limit
history.rps - history rps limit
matching.rps- matching rps limit

as well as persistence qps limits you can set:

frontend.persistenceMaxQPS - frontend persistence max qps
history.persistenceMaxQPS - history persistence max qps
matching.persistenceMaxQPS- matching persistence max qps

Would check your frontend logs for messages like namespace rps exceeded.

For server metrics check:

sum(rate(service_errors_resource_exhausted{}[1m])) by (resource_exhausted_cause)

You should need to also watch your persistence latencies (server metrics):

histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))

Topic		Replies	Views
Starting 100K workflows at the same time caused this Community Support java-sdk	4	1498	July 9, 2020
Workflow Cancelation timeouts Community Support activity , typescript-sdk	2	32	December 19, 2024
Parent Workflow Stalls After Child Workflow Failure (TypeScript SDK) Community Support	4	190	August 10, 2024
Java: Potential deadlock detected while spawning child workflows in a loop Community Support	3	386	January 16, 2024
How to structure a workflow so it ends Community Support go-sdk , workflow-implementat , workflow-config	3	765	November 11, 2021

Mass Workflow bursts cause occasional ContextDeadlineExceeded errors

Related topics