Writing stress tests for Temporal

RobZienert · January 19, 2021, 6:51pm

I’m working on hardening our Temporal deployments and am looking to stress test Temporal to get more familiar with its behavior, find worthwhile metrics to alert on and so-on. While looking around for prior work, I found the blog post of the stress testing Temporal does internally, as well as this post about Temporal as a load testing tool.

I’d like to build something similar - something that the community could potentially use for themselves if there’s interest. I’ve already written a mirror of the “rabbit” scenario, which works great so far.

I could use some input on the “reactor” scenario, though, specifically the way I’m approaching things:

I have an application that will be running the load tests, kicked off on-demand by a grpc request that looks something like the following:

message StartReactorLoadTestRequest {
  string id = 1;
  map<string, int32> dimensions = 2;
  google.protobuf.Duration maxDuration = 3;
}

When receiving this request, a ReactorWorkflow is started, which first creates N ReactorCellWorkflow child workflows (same purpose as the “Stats Aggregator Workflow” from the blog). The ReactorWorkflow then creates a series of different UseCaseWorkflow according to the dimensions request property, where the key is the name of the UseCaseWorkflow class, and the value the number of those workflows to create. The idea here is that over time, we’ll be able to build a load test profile that diverse and similar to the varied workloads we see in production.

Unlike the blog, the load test will not complete on the counter values, but instead just by test duration. I’d like the counter values instead used to inform the ReactorWorkflow to start UseCaseWorkflows according to the deficit so that the desired total number of use case workflows remains roughly constant.

I have two immediate implementation questions:

The UseCaseWorkflows need to select a ReactorCellWorkflow at random. Presumably, this would be done via an activity, where I’d call WorfklowServiceStubs.listOpenWorkflowExecutions, filtering on the ReactorCellWorkflow type when the use case workflow starts?
I want a nice “abort” button for a specific test. Since the UseCaseWorkflows are detached from the reactor workflows, submitting a single cancellation on the ReactorWorkflow won’t clean everything up. What would be the best way to cleanup the use case workflows?

As followup questions: Does this all make sense and do you have any recommendations or things I should consider before proceeding further?

maxim · January 19, 2021, 9:24pm

I would use workflowID as a way to select a random workflow. For example you can ID your workflows as 1 to 100 and let client choose one randomly.
As IDs are known the workflow can have an activity that calls terminate to all of them iterating over IDs.

RobZienert · January 20, 2021, 1:26am

Oh, the simplicity. Makes sense, thanks!

Topic		Replies	Views
Load Testing on Temporal. Has anyone done a basic load testing on Temporal project? Community Support java-sdk , helm , cassandra , web-ui	1	2421	February 15, 2022
Using TestWorkflowEnvironment for testing for an hour with around 350K workflow instances Community Support java-sdk , testing	5	748	August 7, 2023
Load Testing -Delay in Workflow being shown in dashboard Community Support java-sdk	2	348	October 9, 2023
Temporal Workflow Integration Testing in Java SDK Community Support java-sdk	19	3327	February 17, 2023
Workflow Performance with Java SDK Community Support java-sdk	1	742	February 20, 2023

Writing stress tests for Temporal

Related topics