Settings / Recommendations for orchestrating microservices with temporal and MySQL

Hernan_Moreno · November 4, 2020, 9:16pm

We will use temporal in a coming project, however, we would like to know if we are considering the right setup. Our use case is the following:

We are using go
15 microservices as total (1 worker per microservice, each worker hosts the respective workflows and activities of that microservice, in average each workflow uses 5-7 activities)
We have a gateway that hosts the starter and invokes the workflows.
For the whole system, we are expecting to run about 100 workflows/sec for about 10 minutes maximum of continuous load.
We don’t want to use Cassandra because we are really limited to HW resources. So we considered temporal in Kubernetes with MySQL.

My questions :

Do we need to change any settings besides the defaults from the helm chart with MySQL?, what are those?, since we are not reaching the 100 workflows/sec …

Wenquan_Xing · November 4, 2020, 9:36pm

make sure using the latest helm: v1.2.2 (as of Nov, 4th, 2020)
can you plz provide the helm command used to install Temporal?
can you plz provide the hardware configurations? e.g. MySQL: CPU / mem / iops, number of Temporal frontend / matching / history and corresponding hardware configs? (number of frontend / matching / history should be controlled by --set server.replicaCount=<number here>)
during the load test, what is the CPU utilization / load, mem usage? (DB & Temporal services)?

Configs can be found below:

Hernan_Moreno · November 5, 2020, 12:46pm

First of all, since we are targeting EDGE, so we want to run with the lowest possible commodity hardware. Now, we are testing a VM, and based on the performance we can increase resources until reaching the goal.

We are testing a VMware Ubuntu Server 20.4 guest. Now back to your questions.

1- make sure using the latest helm: v1.2.2 (as of Nov, 4th, 2020)
Done.

2- can you plz provide the helm command used to install Temporal?
helm install -f values/values.mysql.yaml temporal
–set server.replicaCount=1
–set prometheus.enabled=false
–set grafana.enabled=false
–set elasticsearch.enabled=false
–set kafka.enabled=false
. --timeout 900s

3- can you plz provide the hardware configurations? e.g. MySQL: CPU / mem / iops, number of Temporal frontend / matching / history and corresponding hardware configs? (number of frontend / matching / history should be controlled by --set server.replicaCount=<number here> )

Running our own MYSQL 5.7 inside the same VM Ubuntu 20.4.
For tests we are using K3d
1 frontend, 1 matching, 1 history, etc.
The CPU is:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 45 bits physical, 48 bits virtual
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel® Core™ i7-5557U CPU @ 3.10GHz
Stepping: 4
CPU MHz: 3100.000
BogoMIPS: 6200.00
Memory 4Gb (consumed only 1.57 Gb with all services up)
CPU without load is about 5 - 10 %

4- During the load test, what is the CPU utilization / load, mem usage? (DB & Temporal services)?

We are using K6 for load tests, ranging between 20 to 30 VUs for 10 minutes.
Our app is running outside the VM (in the same network)
With load, memory looks ok, no major changes. However with CPU peaks at 80% to 95% during tests.
Obviously, with this basic setup, we are not expecting to reach 100 workflows/sec but we are getting only between 3 to 5 workflows/sec which we believe could be improved.

Wenquan_Xing · November 5, 2020, 8:02pm

Seems that you are running the tests against a laptop.
Can you try reducing the number of shards?
e.g. --set server.config.numHistoryShards=8

Hernan_Moreno · November 5, 2020, 11:10pm

Yes a laptop. Just changed and repeated the test. with the setting
–set server.config.numHistoryShards=8

However no tangible change still the workflows per seconds are about 3.23

Wenquan_Xing · November 5, 2020, 11:51pm

Can you plz try the command below? this does CPU profiling, I may need to understand what is happening (or any config which can cause this)

One additional question, how many activity per workflow?

NOTE: you may need to change the 7936 port number

go tool pprof -pdf http://localhost:7936/debug/pprof/profile\?seconds\=30

Hernan_Moreno · November 6, 2020, 12:43pm

Thanks for your support.

In this initial test, I’m calling 1 workflow only which inside calls to 1 activity only, and that activity reads the info from a field in the database for that microservice. The information returned is a small string.

My tests consist of a small basic Http server listening in 8080 port and forwarding the Http request to the workflow which in turn calls the activity and gets the info from the DB, nothing else.

I took samples within 4 stages from the same load test, this time using up to 30 VUs. files attached in here as gif files.
BTW: the workflows/sec this time were: 6.202091/s

Wenquan_Xing · November 6, 2020, 8:18pm

Let us sync in Temporal slack, the above profiling files are not from server.

Wenquan_Xing · November 7, 2020, 4:06am

Synced offline

when creating the worker, there are 2 configurations to keep in mind:

worker.Options{
	...
	MaxConcurrentWorkflowTaskPollers: 40,
	MaxConcurrentActivityTaskPollers: 40,
	...
}

these 2 configuration determines how many poller the worker use to poll and execute tasks from server. The default is 2, which may not be enough.

mrsaints · February 22, 2021, 4:38pm

Looking at https://github.com/temporalio/sdk-go/blob/80e324b116270ff91521c7afd9ec85a0b242384c/internal/internal_worker.go#L67-L69, it says:

	// Set to 2 pollers for now, can adjust later if needed. The typical RTT (round-trip time) is below 1ms within data
	// center. And the poll API latency is about 5ms. With 2 poller, we could achieve around 300~400 RPS.

If we can achieve 300~400 RPS with just 2 pollers, I’m wondering why those two configurations needs to be bumped to a significantly larger number (40 in this case) for more throughput? And what is the difference between changing MaxConcurrentWorkflowTaskExecutionSize, and changing MaxConcurrentWorkflowTaskPollers ?

Hernan_Moreno · February 23, 2021, 1:24pm

Hi,

Well, my understanding is that those parameters can be adjusted depending on your setting, meaning where you are running temporal, is it in a datacenter?, or in your local machine. I learned that Temporal is really sensitive to iops.

Topic		Replies	Views
Temporal performance with golang microservice, Cassandra & Elasticsearch Community Support go-sdk , elasticsearch , cassandra , docker , performance	14	3480	February 1, 2023
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	9167	November 13, 2021
Temporal + Aurora Mysql Performance Community Support performance	1	1465	July 12, 2021
Temporal and concurrency Community Support mysql , scaling , performance	4	2345	July 10, 2020
Temporal server loadtest Community Support mysql	4	1299	January 26, 2021

Settings / Recommendations for orchestrating microservices with temporal and MySQL

Related topics