Performance of SignalWithStartWorkflow()

sergle · December 2, 2020, 3:36pm

I’m trying to perform some stress test, and one part is to schedule big number of workflows.
Right now, in (3xCassandra + 3xTemporal + workers) configuration I was able to reach ~80 rps per node for client.SignalWithStartWorkflow() with Go SDK.
So total number of workflows created is below 240 per second.

All workflows are created for single task queue.
I run image temporalio/auto-setup:1.3.2 with NUM_HISTORY_SHARDS=512 env and dynamic config:

matching.numTaskqueueReadPartitions:
- value: 10
  constraints: {}
matching.numTaskqueueWritePartitions:
- value: 10
  constraints: {}

Cassandra cluster doesn’t seems to be overloaded. Workers are capable to deal with load.
Can I somehow improve the rate number per individual Temporal server?

> docker exec -ti temporal  tctl --ad 192.168.5.7:7233 taskqueue list-partition --tq EV_SENDER_TASK_QUEUE
   WORKFLOWTASKQUEUEPARTITION  |       HOST        
  EV_SENDER_TASK_QUEUE         | 192.168.5.9:7235  
  /_sys/EV_SENDER_TASK_QUEUE/1 | 192.168.5.9:7235  
  /_sys/EV_SENDER_TASK_QUEUE/2 | 192.168.5.8:7235  
  /_sys/EV_SENDER_TASK_QUEUE/3 | 192.168.5.7:7235  
  /_sys/EV_SENDER_TASK_QUEUE/4 | 192.168.5.8:7235  
  /_sys/EV_SENDER_TASK_QUEUE/5 | 192.168.5.8:7235  
  /_sys/EV_SENDER_TASK_QUEUE/6 | 192.168.5.9:7235  
  /_sys/EV_SENDER_TASK_QUEUE/7 | 192.168.5.9:7235  
  /_sys/EV_SENDER_TASK_QUEUE/8 | 192.168.5.7:7235  
  /_sys/EV_SENDER_TASK_QUEUE/9 | 192.168.5.8:7235  
   ACTIVITYTASKQUEUEPARTITION  |       HOST        
  EV_SENDER_TASK_QUEUE         | 192.168.5.9:7235  
  /_sys/EV_SENDER_TASK_QUEUE/1 | 192.168.5.9:7235  
  /_sys/EV_SENDER_TASK_QUEUE/2 | 192.168.5.8:7235  
  /_sys/EV_SENDER_TASK_QUEUE/3 | 192.168.5.7:7235  
  /_sys/EV_SENDER_TASK_QUEUE/4 | 192.168.5.8:7235  
  /_sys/EV_SENDER_TASK_QUEUE/5 | 192.168.5.8:7235  
  /_sys/EV_SENDER_TASK_QUEUE/6 | 192.168.5.9:7235  
  /_sys/EV_SENDER_TASK_QUEUE/7 | 192.168.5.9:7235  
  /_sys/EV_SENDER_TASK_QUEUE/8 | 192.168.5.7:7235  
  /_sys/EV_SENDER_TASK_QUEUE/9 | 192.168.5.8:7235

and

> docker exec -ti temporal  tctl --ad 192.168.5.7:7233 adm tq desc --taskqueue EV_SENDER_TASK_QUEUE                                                                  
  READ LEVEL | ACK LEVEL | BACKLOG | LEASE START TASKID | LEASE END TASKID  
       34500 |     34500 |       0 |                  1 |           100000  

  WORKFLOW POLLER IDENTITY |   LAST ACCESS TIME    
  27472@p-10-web@   | 2020-12-02T15:34:03Z  
  25445@p-12-web@   | 2020-12-02T15:34:03Z  
  28313@p-11-web@   | 2020-12-02T15:34:03Z  
  25665@p-12-web@   | 2020-12-02T15:34:03Z  
  28203@p-11-web@   | 2020-12-02T15:34:03Z  
  27363@p-10-web@   | 2020-12-02T15:34:03Z  
  27582@p-10-web@   | 2020-12-02T15:34:03Z  
  25555@p-12-web@   | 2020-12-02T15:34:03Z  
  28093@p-11-web@   | 2020-12-02T15:34:03Z

My goal is ability to process spikes with 1000 new workflows per second.

Thanks!

manu · December 2, 2020, 5:54pm

Hey Sergle,

Just to clarify, you want to the API call for client.SignalWithStartWorkflow() to succeed at up to 1000 requests per second across the entire cluster, but you are currently only achieving 240 requests per second. Is this correct?

Can you share / describe the code that is invoking SignalWithStartWorkflow in your benchmark scenario?

Thanks,
Manu

Wenquan_Xing · December 2, 2020, 6:10pm

few questions:

how many shards? ref: https://github.com/temporalio/temporal/blob/master/config/development.yaml#L4
how many workflow being signal & started concurrently?

sergle · December 2, 2020, 7:08pm

Hi, Manu

It looks something like this. I started one generating app for each Temporal server (3 in my setup).

// measure actual requests per second
var cnt uint64
go (func() {
    var prev_c uint64
    for {
        time.Sleep(1 * time.Minute)
        log.Printf("Send RPS: %.2f (%d -> %d)\n", float64(cnt - prev_c) / 60.0, prev_c, cnt)
        prev_c = cnt
    }
})()

// import "go.uber.org/ratelimit"
rl := ratelimit.New( RPS )
log.Printf("Generating load with RPS %d\n", RPS)

options := client.StartWorkflowOptions{
                TaskQueue: "EV_SENDER_TASK_QUEUE",
                WorkflowTaskTimeout: 10*time.Minute,
            }

for {
        if finished {
                break
        }

        event := app.EventDetails{
            ... payload ...
        }

        // range, like 0 - 50000
        for i := min_wf; i < max_wf; i++ {
                rl.Take()

                if finished {
                        break
                }

                options.ID = fmt.Sprintf("event-sender-%06d", i)
                _, err := c.SignalWithStartWorkflow(context.Background(), options.ID, "SIGNAL_1", event, options, WF_TYPE)
                if err != nil {
                        log.Fatalln("error sending signal to %s workflow", options.ID, err)
                }
                cnt++
        }
}

sergle · December 2, 2020, 7:11pm

512 shards
Most workflows are started, intermediately processed and completed - the number of workflows in open state is low (up to 40)

manu · December 2, 2020, 7:28pm

Hey Sergle,

So each instance of the App (of which there are 3 instances) is sending SignalWithStart sequentially in a for-loop, correct?
Can you try parallelizing the sends across multiple go-routines within a single instance of the App?

Wenquan_Xing · December 2, 2020, 9:23pm

one more thing, plz try to use atomic with cnt since within the code sample cnt is accessed by multiple goroutine

sergle · December 3, 2020, 2:17pm

Thank you, with go-routines it’s faster. Observed 950 rps with 30 threads
Uploaded it here https://github.com/sergle/stress_test/blob/master/generator/main.go

When I have several Temporal servers, should I distribute load between them or it’s not required?

sergle · December 3, 2020, 3:32pm

When task queue is partitioned between several Temporal instances:

> docker exec -ti temporal  tctl --ad 192.168.5.7:7233 taskqueue list-partition --tq EV_SENDER_TASK_QUEUE
   WORKFLOWTASKQUEUEPARTITION   |       HOST         
  EV_SENDER_TASK_QUEUE          | 192.168.5.9:7235   
  /_sys/EV_SENDER_TASK_QUEUE/1  | 192.168.5.9:7235   
  /_sys/EV_SENDER_TASK_QUEUE/2  | 192.168.5.8:7235   
  /_sys/EV_SENDER_TASK_QUEUE/3  | 192.168.5.7:7235   
...

Should I start worker processes for each Temporal instance (192.168.5.7, 192.168.5.8, 192.168.5.9), or they would distribute the workflows between any connected worker?

Thanks!

Wenquan_Xing · December 3, 2020, 6:29pm

Server side use namespace & workflow ID for load balancing, so as long as the load is targeting at different workflow.

Wenquan_Xing · December 3, 2020, 6:30pm

The load balancing of task queue happens at Temporal frontend, so not additional step is required

sergle · December 3, 2020, 8:11pm

Now when required rate of creating new workflows reached, I observed that execution rate drastically fell down. How can I find out where is the bottleneck?

Wenquan_Xing · December 3, 2020, 9:34pm

execution rate meaning the SDK side of execution rate?

Check the DB CPU / mem, whether DB itself is overloaded (also the persistence metrics: https://github.com/temporalio/temporal/blob/v1.3.2/common/metrics/defs.go#L1075-L1160)
check if the server is taking too much processing tasks (https://github.com/temporalio/temporal/blob/v1.3.2/common/metrics/defs.go#L2108-L2109)
sample the workflow history, looking for workflow task timeout

# 3 may be the cause, e,g, too many signals to a workflow overloading worker (SDK)

Topic		Replies	Views
Workflow Performance with Java SDK Community Support java-sdk	1	743	February 20, 2023
Temporal is slow to start burst of 1000s of workflows Server Deployment go-sdk	0	104	January 22, 2025
Temporal performance with golang microservice, Cassandra & Elasticsearch Community Support go-sdk , elasticsearch , cassandra , docker , performance	14	3418	February 1, 2023
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	8852	November 13, 2021
Ordered activity execution for a workflow ID Community Support go-sdk , performance	5	963	May 20, 2022

Performance of SignalWithStartWorkflow()

Related topics