Improve Performance on production stuff

Andrex_17 · October 19, 2022, 7:50am

Hi all,

With Temporal we managed a workflow splitted on 3 phases (on Java SDK):

Loading phase: extract from oracle source of all datas to manage. I save it on ms with mongodb
Engine phase: read from mongodb populated data and manage the elaboration(read/write operation, huge dataset)
Extractor phase: read result and copy it on oracle, to close the circle

In technical point of view, Engine phase will start only when Loading phase finished. Engine is the part involved on our performance insight.
Atm we have an helm instance of Temporal (default values, just 1 replica) deployed on k8s (fe, web, worker, history and matching)
Temporal’s params are:

maxConcurrentWorkflowTaskExecutionSize: 1000
maxConcurrentActivityExecutionSize: 1000
maxConcurrentActivityTaskPollers: 10
maxWorkflowThreadCount: 180000

Engine works in parallel mode, manage datas grouped by and read configuration directly from yml file. We run a lot of activities in this parallel way. (Async - 100 parallel - he needs a lot of memory to do that)
Working with params(example put 500 parallel instead of 100), the problem that i found is context deadline exceeded.
I read a lot of topic on forum that contains ur answer but i didn’t find a solution for my case.
Another importat thing is that i don’t see any correct value on grafana. I followed the guide on blog but i don’t reach the goal also for this.
I need a technical help to improve performance of Temporal to use it for each product in our company.

Thx a lot guys.
A.

tihomir · October 19, 2022, 4:47pm

For the context deadline exceeded error, are you getting that in your worker code, client, maybe somewhere else? Can you show the full error?

What persistence store are you using? Are you using the default 512 numHistoryShards in values.yaml? Are you changing any default configs in the helm chart?

I’m not sure that having a single replica of especially history and matching services would give you a great setup for performance testing, would probably go with 5 history, 3 frontend, 3 matching and 2 frontend (would nee to configure ingress if its >1) and go up from there. Would be good to know the resources you are setting up for the pods too.

I think to start looking at improving performance you need to set up SDK and server metrics. Can you give more info on the Grafana issue you are having? What guide did you follow?

Topic		Replies	Views
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	8724	November 13, 2021
How to get best Temporal Performance? Community Support general-impl , performance , metrics , best-practices , typescript-sdk	4	2234	January 24, 2024
Workflow Performance with Java SDK Community Support java-sdk	1	725	February 20, 2023
Temporal throughput Community Support general-impl , best-practices	16	5609	January 20, 2025
Temporal performance issues Community Support java-sdk , performance , worker , kubernetes	1	1786	April 26, 2023

Improve Performance on production stuff

Related topics