We will use temporal in a coming project, however, we would like to know if we are considering the right setup. Our use case is the following:
We are using go
15 microservices as total (1 worker per microservice, each worker hosts the respective workflows and activities of that microservice, in average each workflow uses 5-7 activities)
We have a gateway that hosts the starter and invokes the workflows.
For the whole system, we are expecting to run about 100 workflows/sec for about 10 minutes maximum of continuous load.
We don’t want to use Cassandra because we are really limited to HW resources. So we considered temporal in Kubernetes with MySQL.
My questions :
Do we need to change any settings besides the defaults from the helm chart with MySQL?, what are those?, since we are not reaching the 100 workflows/sec …
make sure using the latest helm: v1.2.2 (as of Nov, 4th, 2020)
can you plz provide the helm command used to install Temporal?
can you plz provide the hardware configurations? e.g. MySQL: CPU / mem / iops, number of Temporal frontend / matching / history and corresponding hardware configs? (number of frontend / matching / history should be controlled by --set server.replicaCount=<number here>)
during the load test, what is the CPU utilization / load, mem usage? (DB & Temporal services)?
First of all, since we are targeting EDGE,so we want to run with the lowest possible commodity hardware. Now, we are testing a VM, and based on the performance we can increase resources until reaching the goal.
We are testing a VMware Ubuntu Server 20.4 guest. Now back to your questions.
1- make sure using the latest helm: v1.2.2 (as of Nov, 4th, 2020)
Done.
2- can you plz provide the helm command used to install Temporal?
helm install -f values/values.mysql.yaml temporal
–set server.replicaCount=1
–set prometheus.enabled=false
–set grafana.enabled=false
–set elasticsearch.enabled=false
–set kafka.enabled=false
. --timeout 900s
3- can you plz provide the hardware configurations? e.g. MySQL: CPU / mem / iops, number of Temporal frontend / matching / history and corresponding hardware configs? (number of frontend / matching / history should be controlled by --set server.replicaCount=<number here> )
Running our own MYSQL 5.7 inside the same VM Ubuntu 20.4.
For tests we are using K3d
1 frontend, 1 matching, 1 history, etc.
The CPU is:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 45 bits physical, 48 bits virtual
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel® Core™ i7-5557U CPU @ 3.10GHz
Stepping: 4
CPU MHz: 3100.000
BogoMIPS: 6200.00
Memory 4Gb (consumed only 1.57 Gb with all services up)
CPU without load is about 5 - 10 %
4- During the load test, what is the CPU utilization / load, mem usage? (DB & Temporal services)?
We are using K6 for load tests, ranging between 20 to 30 VUs for 10 minutes.
Our app is running outside the VM (in the same network)
With load, memory looks ok, no major changes. However with CPU peaks at 80% to 95% during tests.
Obviously, with this basic setup, we are not expecting to reach 100 workflows/sec but we are getting only between 3 to 5 workflows/sec which we believe could be improved.
In this initial test, I’m calling 1 workflow only which inside calls to 1 activity only, and that activity reads the info from a field in the database for that microservice. The information returned is a small string.
My tests consist of a small basic Http server listening in 8080 port and forwarding the Http request to the workflow which in turn calls the activity and gets the info from the DB, nothing else.
I took samples within 4 stages from the same load test, this time using up to 30 VUs. files attached in here as gif files.
BTW: the workflows/sec this time were: 6.202091/s
// Set to 2 pollers for now, can adjust later if needed. The typical RTT (round-trip time) is below 1ms within data
// center. And the poll API latency is about 5ms. With 2 poller, we could achieve around 300~400 RPS.
If we can achieve 300~400 RPS with just 2 pollers, I’m wondering why those two configurations needs to be bumped to a significantly larger number (40 in this case) for more throughput? And what is the difference between changing MaxConcurrentWorkflowTaskExecutionSize, and changing MaxConcurrentWorkflowTaskPollers ?
Well, my understanding is that those parameters can be adjusted depending on your setting, meaning where you are running temporal, is it in a datacenter?, or in your local machine. I learned that Temporal is really sensitive to iops.