Settings / Recommendations for orchestrating microservices with temporal and MySQL

We will use temporal in a coming project, however, we would like to know if we are considering the right setup. Our use case is the following:

  • We are using go

  • 15 microservices as total (1 worker per microservice, each worker hosts the respective workflows and activities of that microservice, in average each workflow uses 5-7 activities)

  • We have a gateway that hosts the starter and invokes the workflows.

  • For the whole system, we are expecting to run about 100 workflows/sec for about 10 minutes maximum of continuous load.

  • We don’t want to use Cassandra because we are really limited to HW resources. So we considered temporal in Kubernetes with MySQL.

My questions :

  • Do we need to change any settings besides the defaults from the helm chart with MySQL?, what are those?, since we are not reaching the 100 workflows/sec …
  1. make sure using the latest helm: v1.2.2 (as of Nov, 4th, 2020)

  2. can you plz provide the helm command used to install Temporal?

  3. can you plz provide the hardware configurations? e.g. MySQL: CPU / mem / iops, number of Temporal frontend / matching / history and corresponding hardware configs? (number of frontend / matching / history should be controlled by --set server.replicaCount=<number here>)

  4. during the load test, what is the CPU utilization / load, mem usage? (DB & Temporal services)?

Configs can be found below:

  1. https://github.com/temporalio/helm-charts/blob/master/README.md#install-and-configure-temporal
  2. https://github.com/temporalio/helm-charts/blob/master/README.md#install-with-your-own-mysql

First of all, since we are targeting EDGE, so we want to run with the lowest possible commodity hardware. Now, we are testing a VM, and based on the performance we can increase resources until reaching the goal.

We are testing a VMware Ubuntu Server 20.4 guest. Now back to your questions.

1- make sure using the latest helm: v1.2.2 (as of Nov, 4th, 2020)
Done.

2- can you plz provide the helm command used to install Temporal?
helm install -f values/values.mysql.yaml temporal
–set server.replicaCount=1
–set prometheus.enabled=false
–set grafana.enabled=false
–set elasticsearch.enabled=false
–set kafka.enabled=false
. --timeout 900s

3- can you plz provide the hardware configurations? e.g. MySQL: CPU / mem / iops, number of Temporal frontend / matching / history and corresponding hardware configs? (number of frontend / matching / history should be controlled by --set server.replicaCount=<number here> )

  • Running our own MYSQL 5.7 inside the same VM Ubuntu 20.4.

  • For tests we are using K3d

  • 1 frontend, 1 matching, 1 history, etc.

  • The CPU is:
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    Address sizes: 45 bits physical, 48 bits virtual
    CPU(s): 2
    On-line CPU(s) list: 0,1
    Thread(s) per core: 1
    Core(s) per socket: 1
    Socket(s): 2
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 61
    Model name: Intel® Core™ i7-5557U CPU @ 3.10GHz
    Stepping: 4
    CPU MHz: 3100.000
    BogoMIPS: 6200.00

  • Memory 4Gb (consumed only 1.57 Gb with all services up)

  • CPU without load is about 5 - 10 %

4- During the load test, what is the CPU utilization / load, mem usage? (DB & Temporal services)?

  • We are using K6 for load tests, ranging between 20 to 30 VUs for 10 minutes.
  • Our app is running outside the VM (in the same network)
  • With load, memory looks ok, no major changes. However with CPU peaks at 80% to 95% during tests.
    Obviously, with this basic setup, we are not expecting to reach 100 workflows/sec but we are getting only between 3 to 5 workflows/sec which we believe could be improved.

Seems that you are running the tests against a laptop.
Can you try reducing the number of shards?
e.g. --set server.config.numHistoryShards=8

Yes a laptop. Just changed and repeated the test. with the setting
–set server.config.numHistoryShards=8

However no tangible change still the workflows per seconds are about 3.23

Can you plz try the command below? this does CPU profiling, I may need to understand what is happening (or any config which can cause this)

One additional question, how many activity per workflow?

NOTE: you may need to change the 7936 port number

go tool pprof -pdf http://localhost:7936/debug/pprof/profile\?seconds\=30

Thanks for your support.

In this initial test, I’m calling 1 workflow only which inside calls to 1 activity only, and that activity reads the info from a field in the database for that microservice. The information returned is a small string.

My tests consist of a small basic Http server listening in 8080 port and forwarding the Http request to the workflow which in turn calls the activity and gets the info from the DB, nothing else.

I took samples within 4 stages from the same load test, this time using up to 30 VUs. files attached in here as gif files.
BTW: the workflows/sec this time were: 6.202091/s

Let us sync in Temporal slack, the above profiling files are not from server.

Synced offline

when creating the worker, there are 2 configurations to keep in mind:

worker.Options{
	...
	MaxConcurrentWorkflowTaskPollers: 40,
	MaxConcurrentActivityTaskPollers: 40,
	...
}

these 2 configuration determines how many poller the worker use to poll and execute tasks from server. The default is 2, which may not be enough.

1 Like

Looking at https://github.com/temporalio/sdk-go/blob/80e324b116270ff91521c7afd9ec85a0b242384c/internal/internal_worker.go#L67-L69, it says:

	// Set to 2 pollers for now, can adjust later if needed. The typical RTT (round-trip time) is below 1ms within data
	// center. And the poll API latency is about 5ms. With 2 poller, we could achieve around 300~400 RPS.

If we can achieve 300~400 RPS with just 2 pollers, I’m wondering why those two configurations needs to be bumped to a significantly larger number (40 in this case) for more throughput? And what is the difference between changing MaxConcurrentWorkflowTaskExecutionSize, and changing MaxConcurrentWorkflowTaskPollers ?

2 Likes

Hi,

Well, my understanding is that those parameters can be adjusted depending on your setting, meaning where you are running temporal, is it in a datacenter?, or in your local machine. I learned that Temporal is really sensitive to iops.