Temporal Performance with golang microservices

hossein · August 2, 2022, 10:13pm

Hi,
I have a project that is very similar to the project described here, we have only 1 workflow with 2 activities, the first activity is responsible for sending an HTTP request to a downstream service (with 100ms response time) and the second activity is responsible for publishing the HTTP call response status to the NATS.
Currently, on the local M1 machine, we get 16 workflows/sec with Cassandra, but for production, we need something around 5000 workflows/sec.
we need to execute around 5 million HTTP requests in 15 minutes.
I have a few questions:

Is Temporal a good fit for such a project with these performance requirements?
If temporal is a good fit, what are your recommendations about the production resources that we need?
Is the local activity a better option for us (to improve performance)?
In terms of performance, Do you recommend using 2 different activities for sending HTTP requests and publishing status codes to NATS, or it would be better to use just a single activity for the whole functionality?

tihomir · August 4, 2022, 4:28pm

we need to execute around 5 million HTTP requests in 15 minutes.

Temporal is capable of this type of load (~5.5K activities / second, ~22-25K state transitions / second). It would come at a cost of having to set up a pretty large Cassandra cluster tho imo.
For simple “fire and forget” short http calls you might want to consider using local activities which would definitely reduce this cost.

I think the question for you is what is your use case? Do you care about durability of these calls (which is one of main reasons to use Temporal for the your use case imo)? What should happen if first http call fails, do you need retries or can just forget it failed?
Trying to understand your use case and need to be able to tell if Temporal would be a good use case for it.

hossein · August 5, 2022, 10:08pm

Thanks for your response.
Each HTTP call is a single job, in case of failure(network error) we need to retry it, also for each response we need to send the response status to NATS. So, each workflow has two activities: sending HTTP calls and publishing the status to the NATS. We prefer to be resilient against any failure, especially for publishing the status to the NATS, but for the HTTP request just in case of network failure we need to retry it. the durability of the call is important but with reasonable resource usage. If it costs us a heavy cluster, we prefer to handle the failures in another service (publisher of the jobs).

I have another question, what happens if we consider aggregating 100 thousand of the payloads, and sending them as a single bulk request, for example instead of calling an Endpoint 5 million times, we can create 50 workflows that each workflow has 100 thousand records? Is it okay for temporal to have workflows with huge states (100 hundred records)?

maxim · August 5, 2022, 11:01pm

What is the source of data for these HTTP requests? Have you considered making these requests from an activity that directly reads the source and makes both HTTP and NATS request in a loop? This activity could record the progress in the heartbeat. On failure, the data from the last heartbeat can be accessed when an activity is retried.

hossein · August 6, 2022, 12:17am

We have another service that reads the records from DB and creates a new workflow for each record. This flow is too slow (making 100 thousand workflow takes around 3 minutes), maybe your solution helps us with this issue.

But as I said our main concern is about cluster resources that we need to handle this heavy load of workflow.

maxim · August 6, 2022, 2:55am

If you execute actual workflow logic in an activity the cluster utilization is going to be minimal.

hossein · August 6, 2022, 5:54am

Yeah, but still I need an estimation of resource requirements with just one activity per workflow, 5.5k activities/second.

maxim · August 6, 2022, 5:24pm

I believe you don’t need to execute an activity per request. A single activity (or a few parallel ones) can execute all of them in a loop.

hossein · August 7, 2022, 9:31am

Okay, cool, but what about concurrent requests? I mean because of HTTP response time, doing all HTTP requests in a loop make it too slow. Is temporal supports goroutine or other concurrent paradigms? So I can make concurrent requests in the loop without the need to wait for the last request to be finished?

maxim · August 7, 2022, 5:38pm

Activity doesn’t have any limitation on the type of code it can support. So you can use multiple goroutines to implement it. But I would recommend starting multiple such activities from a workflow in parallel.

Topic		Replies	Views
Temporal performance with golang microservice, Cassandra & Elasticsearch Community Support go-sdk , elasticsearch , cassandra , docker , performance	14	3423	February 1, 2023
Temporal and concurrency Community Support mysql , scaling , performance	4	2293	July 10, 2020
Workflow Performance with Java SDK Community Support java-sdk	1	746	February 20, 2023
Temporal throughput not improving Community Support cassandra , metrics	2	1112	October 2, 2022
Temporal for synchronous API Community Support	3	1333	January 31, 2021

Temporal Performance with golang microservices

Related topics