Seeing high latencies between two subsequent activity task executions

Yimin_Chen · July 15, 2022, 4:13am

You should see why you still get the resource exhausted errors. Check the resource_exhausted_cause tag on that metrics to see if it is rps limit or concurrent limit or system overload. You may need to increase frontend.namespaceCount if you see concurrent limit. You need to increase persistence rate limit if you see system overload: [frontend|history|matching].persistenceMaxQPS.
And you probably need to increase frontend.namespaceRPS as well, but I guess you already done so.

The StickyCacheEviction indicate your worker’s sticky cache might not be big enough. You can either increase worker count, or increase sticky cache size.

poojabhutada · July 19, 2022, 5:49am

@Yimin_Chen Can you please explain the difference between below attributes :
FrontendRPS: “frontend.rps”,
FrontendMaxNamespaceRPSPerInstance: “frontend.namespaceRPS”,
FrontendMaxNamespaceCountPerInstance: “frontend.namespaceCount”,
FrontendGlobalNamespaceRPS: “frontend.globalNamespacerps”,

We have a single namespace “default” and tried increasing “frontend.rps” values till 48K, but still we are getting resource exhausted errors with cause as rps limit. Hence, need a clarification on “namespaceRPS” attribute as well.
Also, what are the max values that these attributes support?
We also see service_errors_entity_not_found in dashboard, what config needs to be verified for these kind of errors?

Yimin_Chen · July 19, 2022, 4:30pm

frontend.rps / history.rps / matching.rps sets RPS limit per service pod.
frontend.namespaceRPS sets per namespace RPS limit.
There is no max value limit on those configs.

service_errors_entity_not_found is expected error, it means workflow (or other entities like activity) cannot be found. This could happen if there are some tasks (timer/transfer/visibility tasks) after workflow is deleted (like due to retention). Or if you try to send signal to non-exists workflow.

Topic		Replies	Views
Workflow task timed out on GKE Community Support java-sdk , cassandra , metrics	6	1071	June 8, 2022
High Activity Latency Community Support	2	509	March 21, 2021
Performance test on GKE Community Support java-sdk , deployment	2	1166	May 27, 2022
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	8664	November 13, 2021
Workflow Performance with Java SDK Community Support java-sdk	1	713	February 20, 2023

Seeing high latencies between two subsequent activity task executions

Related topics