I’m working on hardening our Temporal deployments and am looking to stress test Temporal to get more familiar with its behavior, find worthwhile metrics to alert on and so-on. While looking around for prior work, I found the blog post of the stress testing Temporal does internally, as well as this post about Temporal as a load testing tool.
I’d like to build something similar - something that the community could potentially use for themselves if there’s interest. I’ve already written a mirror of the “rabbit” scenario, which works great so far.
I could use some input on the “reactor” scenario, though, specifically the way I’m approaching things:
I have an application that will be running the load tests, kicked off on-demand by a grpc request that looks something like the following:
message StartReactorLoadTestRequest {
string id = 1;
map<string, int32> dimensions = 2;
google.protobuf.Duration maxDuration = 3;
}
When receiving this request, a ReactorWorkflow
is started, which first creates N ReactorCellWorkflow
child workflows (same purpose as the “Stats Aggregator Workflow” from the blog). The ReactorWorkflow
then creates a series of different UseCaseWorkflow
according to the dimensions
request property, where the key is the name of the UseCaseWorkflow
class, and the value the number of those workflows to create. The idea here is that over time, we’ll be able to build a load test profile that diverse and similar to the varied workloads we see in production.
Unlike the blog, the load test will not complete on the counter values, but instead just by test duration. I’d like the counter values instead used to inform the ReactorWorkflow
to start UseCaseWorkflow
s according to the deficit so that the desired total number of use case workflows remains roughly constant.
I have two immediate implementation questions:
- The
UseCaseWorkflow
s need to select aReactorCellWorkflow
at random. Presumably, this would be done via an activity, where I’d callWorfklowServiceStubs.listOpenWorkflowExecutions
, filtering on theReactorCellWorkflow
type when the use case workflow starts? - I want a nice “abort” button for a specific test. Since the
UseCaseWorkflow
s are detached from the reactor workflows, submitting a single cancellation on theReactorWorkflow
won’t clean everything up. What would be the best way to cleanup the use case workflows?
As followup questions: Does this all make sense and do you have any recommendations or things I should consider before proceeding further?