Change RPS limit for operator service

rustemroad · September 11, 2024, 3:10pm

Hi Temporal community!

I would like to ask about advice with the next issue we faced.
Our application is using Go SDK to work with Temporal.

Issue is related to integration tests where we start Temporal in docker container and configure it to be used by other applications.
One of the configuration steps - to add required search attributes in the namespace.
From time to time we get the next error: "Failed to add search attributes to store postgres12: Unable to save search attributes: service rate limit exceeded." We can’t reproduce it constantly.

Could you please advice, which limit I can adjust to bypass this issue?

Thank you very much in advance!

tihomir · September 11, 2024, 9:27pm

service rate limit exceeded

can you look at your resource exhausted graph:

sum(rate(service_errors_resource_exhausted{}[1m])) by (operation, resource_exhausted_cause)

and see what the resource exhausted cause is. visibility api has much lower rps limits and you could consider increasing it via dynamic config
frontend.namespaceRPS.visibility
(default 10)

rustemroad · September 12, 2024, 6:47am

Hi @tihomir !

Thank you very much for the reply. We had rate limit issue on our prod environment with frontend.namespaceRPS.visibility and it helped that time.

In our integration tests we have dynamic config with the next content at the moment:

system.forceSearchAttributesCacheRefreshOnRead:
  - value: true # Dev setup only. Please don't turn this on in production. We need it to use custom search attributes in the test suite right after they are created.
    constraints: {}
frontend.globalRPS:
  - value: 2400
    constraints: { }

We tried to set frontend.namespaceRPS.visibility there with no luck. I assume it’s because for creating search attributes we use Operator Service where might be different rate limit applied.

Any ideas around it? Thank you!

rustemroad · September 12, 2024, 11:34am

Just to add some more clarity, we don’t have prometheus as a part of our CI pipelines set up. That’s why we are looking for the way to identify the rate limit in another way.

rustemroad · September 12, 2024, 9:18pm

It looks like we managed to handle this particular rate limit via dynamic configuration:

frontend.rps.namespaceReplicationInducingAPIs:
  - value: 1200
    constraints: { }

However, flakiness is still in place with another error: Failed to add search attributes to store postgres12: Unable to save search attributes: UpdateNamespace: conditional update error: expect: 49, actual: 56..

Might be useful to share more details about out tests setup:

We start test container with Temporal once with docker image: temporalio/auto-setup:1.24.2.0.
Tests are running in parallel with t.Parallel() and comminicate with shared running Temporal server.
In each test the new random namespace is created and we add all required search attributes to newly create namespace.

Idea behind of creation a new namespace for each test - to keep the logic of the test as isolated as possible. It might be a wrong way of doing tests with temporal so any advice is very appreciated here.

Thank you very much in advance.

Topic		Replies	Views
Need Help in Mapping method call with Rate limit property Community Support java-sdk	8	676	June 14, 2023
RESOURCE_EXHAUSTED: namespace rate limit exceeded Community Support java-sdk , postgresql	14	3651	October 19, 2022
Namespace Rate Limit Exceeded Community Support java-sdk	1	214	October 13, 2024
Rate Limiting by namespace Community Support	9	2658	July 18, 2023
Frontend resource_exhausted ConcurrentLimit Server Deployment frontend	2	159	August 26, 2024

Change RPS limit for operator service

Related topics