We use temporal to orchestrate some long running tasks. Some of these tasks can span over 40 days.
Most of the workflow time is spent awaiting for someone else to do something.
Unfortunately we have no idea of when they will do it. It can be in the next few minutes up to in several days And of course, when they start to do it, they need a us to resume the workflow quickly. To achieve this, we constantly poll them, checking if theyâve done their part before we can continue on our side. For now, we poll every 2 minutes.
Our issue is that we exhaust the 5 000 events history allowed by Temporal in ~3 days. To keep our workflow running for up to ~30 days, we increased the polling interval to 10m which is not acceptable.
If we donât want to re-design the whole thing, we thought we could leverage the Continue As New feature to start a new workflow when we reach a certain amount of events.
The documentation says:
[âŚ] we will warn you every 10,000 Events
Is it possible to ask to âContinueAsNewâ when we get one of these warning? I wasnât able to find in the documentation how to listen to these warnings? Is it a special âsignal channelâ we should listen for in the workflow?
EDIT:
I also found a set of issues discussing the matter (like: temporalio/sdk-features#16). Looking at the protobuf, of the API, I can see the new field. I can also find it in the api-go repository.
Iâm wondering what are the blockers to add it to the sdk-go and if this is something we can help with somehow?
By âthemâ do you mean workflows? How are you polling? If you use queries, this does not affect history, but maybe you do not mean polling workflows from the client? Maybe you can signal your workflow when this external thing is done instead of polling for completion? Or maybe you can poll in an activity instead of a workflow (I can give pointers here)? Polling from a workflow is not usually the best way to do things.
Not exactly. Rather, you can keep a counter of things you do in your workflow and continue-as-new when youâre ready. This is the suggested approach.
It is on the roadmap as you see there, but unfortunately itâs not as easy as just using that field. That field is for API description, it is not part of what builds âworkflow infoâ from the workflow POV.
Just to add, here are some polling best practices, your case i think would fall under âinfrequent pollingâ.
Even tho the limit is 50K, itâs best practice to not let your workflow histories get super large as that could introduce possible latencies.
One idea would be to call continueAsNew as you mentioned, you could also look at using child workflows to partition the history, as each child workflow invocation would have its own history (and own 50K limit).
ContinueAsNew would be fine and if you implement polling via activity retries as mentioned in the link, you could call it after X number of performed activities for example.
I assume that means inside the workflow you sleep or have a timer and every two minutes you kick off an activity that makes an HTTP invocation?
Assuming you canât have that other system instead send a signal to your workflow, you would probably be better off using a long-running activity to do this (and you can poll as frequently as youâd like). Basically what youâd do is start a polling activity with a reasonably low heartbeat timeout so the system can detect if the activity has crashed. Then in your actual activity, poll as frequently as youâd like but remember to continually heartbeat well within the heartbeat timeout. Then you can return from the activity once the poll succeeds and the workflow only has to wait on that one activity.
Of course, if that system can notify the workflow via a signal, that is the best approach.
Thanks for the pointer. Just reading the link, Iâm mot sure we can use the âinfrequent pollingâ solution because it relies on Temporal retry system by marking the activity as failed. At first, thatâs what we wanted to do but our monitoring system will trigger some alarms when an activity fails.