Mass Workflow bursts cause occasional ContextDeadlineExceeded errors

There’s a few instances where I want to create 10,000+ new workflows at once.

Each client request is a grpc call to your frontend service, frontend has per-namespace rps limits (frontend.namespaceRps in your dynamic config), see this post for configuration info.

Temporal has also rps limits:
frontend.rps - frontend overall rps limit
history.rps - history rps limit
matching.rps- matching rps limit

as well as persistence qps limits you can set:

frontend.persistenceMaxQPS - frontend persistence max qps
history.persistenceMaxQPS - history persistence max qps
matching.persistenceMaxQPS- matching persistence max qps

Would check your frontend logs for messages like namespace rps exceeded.

For server metrics check:

sum(rate(service_errors_resource_exhausted{}[1m])) by (resource_exhausted_cause)

You should need to also watch your persistence latencies (server metrics):

histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))

1 Like