We use temporal to orchestrate some long running tasks. Some of these tasks can span over 40 days.
Most of the workflow time is spent awaiting for someone else to do something.
Unfortunately we have no idea of when they will do it. It can be in the next few minutes up to in several days And of course, when they start to do it, they need a us to resume the workflow quickly. To achieve this, we constantly poll them, checking if they’ve done their part before we can continue on our side. For now, we poll every 2 minutes.
Our issue is that we exhaust the 5 000 events history allowed by Temporal in ~3 days. To keep our workflow running for up to ~30 days, we increased the polling interval to 10m which is not acceptable.
If we don’t want to re-design the whole thing, we thought we could leverage the Continue As New feature to start a new workflow when we reach a certain amount of events.
The documentation says:
[…] we will warn you every 10,000 Events
Is it possible to ask to “ContinueAsNew” when we get one of these warning? I wasn’t able to find in the documentation how to listen to these warnings? Is it a special “signal channel” we should listen for in the workflow?
I also found a set of issues discussing the matter (like: temporalio/sdk-features#16). Looking at the protobuf, of the API, I can see the new field. I can also find it in the api-go repository.
I’m wondering what are the blockers to add it to the sdk-go and if this is something we can help with somehow?
By “them” do you mean workflows? How are you polling? If you use queries, this does not affect history, but maybe you do not mean polling workflows from the client? Maybe you can signal your workflow when this external thing is done instead of polling for completion? Or maybe you can poll in an activity instead of a workflow (I can give pointers here)? Polling from a workflow is not usually the best way to do things.
Not exactly. Rather, you can keep a counter of things you do in your workflow and continue-as-new when you’re ready. This is the suggested approach.
It is on the roadmap as you see there, but unfortunately it’s not as easy as just using that field. That field is for API description, it is not part of what builds “workflow info” from the workflow POV.
Just to add, here are some polling best practices, your case i think would fall under “infrequent polling”.
Even tho the limit is 50K, it’s best practice to not let your workflow histories get super large as that could introduce possible latencies.
One idea would be to call continueAsNew as you mentioned, you could also look at using child workflows to partition the history, as each child workflow invocation would have its own history (and own 50K limit).
ContinueAsNew would be fine and if you implement polling via activity retries as mentioned in the link, you could call it after X number of performed activities for example.
I assume that means inside the workflow you sleep or have a timer and every two minutes you kick off an activity that makes an HTTP invocation?
Assuming you can’t have that other system instead send a signal to your workflow, you would probably be better off using a long-running activity to do this (and you can poll as frequently as you’d like). Basically what you’d do is start a polling activity with a reasonably low heartbeat timeout so the system can detect if the activity has crashed. Then in your actual activity, poll as frequently as you’d like but remember to continually heartbeat well within the heartbeat timeout. Then you can return from the activity once the poll succeeds and the workflow only has to wait on that one activity.
Of course, if that system can notify the workflow via a signal, that is the best approach.
Thanks for the pointer. Just reading the link, I’m mot sure we can use the “infrequent polling” solution because it relies on Temporal retry system by marking the activity as failed. At first, that’s what we wanted to do but our monitoring system will trigger some alarms when an activity fails.