Continue as new when reaching 5 000 events limit

kevind · March 24, 2022, 11:20am

Hello everyone,

We use temporal to orchestrate some long running tasks. Some of these tasks can span over 40 days.
Most of the workflow time is spent awaiting for someone else to do something.

Unfortunately we have no idea of when they will do it. It can be in the next few minutes up to in several days And of course, when they start to do it, they need a us to resume the workflow quickly. To achieve this, we constantly poll them, checking if they’ve done their part before we can continue on our side. For now, we poll every 2 minutes.

Our issue is that we exhaust the 5 000 events history allowed by Temporal in ~3 days. To keep our workflow running for up to ~30 days, we increased the polling interval to 10m which is not acceptable.

If we don’t want to re-design the whole thing, we thought we could leverage the Continue As New feature to start a new workflow when we reach a certain amount of events.

The documentation says:

[…] we will warn you every 10,000 Events

Is it possible to ask to “ContinueAsNew” when we get one of these warning? I wasn’t able to find in the documentation how to listen to these warnings? Is it a special “signal channel” we should listen for in the workflow?

EDIT:

I also found a set of issues discussing the matter (like: temporalio/sdk-features#16). Looking at the protobuf, of the API, I can see the new field. I can also find it in the api-go repository.

I’m wondering what are the blockers to add it to the sdk-go and if this is something we can help with somehow?

Thanks for your help

Information: we use the Go-SDK in v1.13.1.

Chad_Retz · March 24, 2022, 1:43pm

By “them” do you mean workflows? How are you polling? If you use queries, this does not affect history, but maybe you do not mean polling workflows from the client? Maybe you can signal your workflow when this external thing is done instead of polling for completion? Or maybe you can poll in an activity instead of a workflow (I can give pointers here)? Polling from a workflow is not usually the best way to do things.

Not exactly. Rather, you can keep a counter of things you do in your workflow and continue-as-new when you’re ready. This is the suggested approach.

It is on the roadmap as you see there, but unfortunately it’s not as easy as just using that field. That field is for API description, it is not part of what builds “workflow info” from the workflow POV.

tihomir · March 24, 2022, 2:14pm

Just to add, here are some polling best practices, your case i think would fall under “infrequent polling”.

Even tho the limit is 50K, it’s best practice to not let your workflow histories get super large as that could introduce possible latencies.
One idea would be to call continueAsNew as you mentioned, you could also look at using child workflows to partition the history, as each child workflow invocation would have its own history (and own 50K limit).
ContinueAsNew would be fine and if you implement polling via activity retries as mentioned in the link, you could call it after X number of performed activities for example.

kevind · March 24, 2022, 2:14pm

Nope, I was just trying to say “we poll an HTTP endpoint from another team” constantly (every 2 minutes) which fills our history quite fast.

Thanks for the suggestion. It seems to be a pragmatic solution, easy enough, until we can get this information from temporal itself.

Chad_Retz · March 24, 2022, 2:19pm

I assume that means inside the workflow you sleep or have a timer and every two minutes you kick off an activity that makes an HTTP invocation?

Assuming you can’t have that other system instead send a signal to your workflow, you would probably be better off using a long-running activity to do this (and you can poll as frequently as you’d like). Basically what you’d do is start a polling activity with a reasonably low heartbeat timeout so the system can detect if the activity has crashed. Then in your actual activity, poll as frequently as you’d like but remember to continually heartbeat well within the heartbeat timeout. Then you can return from the activity once the poll succeeds and the workflow only has to wait on that one activity.

Of course, if that system can notify the workflow via a signal, that is the best approach.

kevind · March 24, 2022, 2:21pm

Thanks for the pointer. Just reading the link, I’m mot sure we can use the “infrequent polling” solution because it relies on Temporal retry system by marking the activity as failed. At first, that’s what we wanted to do but our monitoring system will trigger some alarms when an activity fails.

tihomir · March 24, 2022, 2:22pm

Got it , you can then implement it via the “frequent” solution as loop in activity as well.

kevind · March 24, 2022, 2:24pm

Yes exactly! We call the endpoint, check the answer, sleep for two minutes, repeat until we get the status we expect

Nope. unfortunately it’s not possible

Wow. That’s an excellent idea. I will look at how we can implement this but that seems to be a very good solution for the issue at hand.

W_L · July 6, 2022, 3:02am

Since there is HistoryLength, why not expose it in the SDK so THAT I have to call “Workflow.continueAsNew();” frequently.

maxim · July 6, 2022, 3:04am

This is actively being worked on: Expose history size to workflows by dnr · Pull Request #3055 · temporalio/temporal · GitHub

Topic		Replies	Views
Automatic ContinueAsNew Community Support go-sdk	1	828	July 1, 2022
Continue As New has no chance to run in a continuously streaming signal workflow Community Support go-sdk , continue-as-new	15	3421	August 1, 2022
Continue-as-new overhead, suitability of using Temporal Community Support	1	416	March 5, 2024
Design for coordinator workflow with potentially large history Community Support go-sdk , cassandra	2	991	August 31, 2021
Scaling Temporal: 400M Workflows with Continue-as-New Pattern Community Support go-sdk , general-impl	11	1933	August 7, 2023

Continue as new when reaching 5 000 events limit

Related topics