How do retries/crashes affect a long running polling activity, and a workflow?

Jamie_Sebastian · September 21, 2021, 8:42pm

Hi, I am new to Temporal and I have a few questions about a scenario about long-running polling for a state, and for actions upon state transitions. After some research, I believe the following will work but would like some confirmation:

Workflow will have a local variable cur_state storing the current state. Then, while cur_state is not the end state, do the following: run a long-running polling activity (which uses heartbeats), passing in cur_state as the param. This activity will only complete when the polled state differs from the cur_state passed as a parameter. Now back to the workflow: the activity will return the polled_state, and this means there is a state transition from cur_state to polled_state, and thus the relevant action can be done. Then carry on with the while loop.

Questions:

If the activity worker dies midway, the value of parameter cur_state will be restored?
If we have very many workflows running at the same time, there will be many of these activities happening, when an activity sleeps (in between polling), will it be taking up the worker usage? That is, will activities be able to ’yield’ worker usage when they sleep? In general, should the problem of ‘not enough workers’ be something to be worried about?
When workflow crashes, from what I researched, it seems that it will restart the whole workflow but since each activity result is stored, it will effectively skip through the activity and just return the result immediately to the flow. I want to confirm that the activity result stored is not just the latest call to the activity, but for each call previously made to the activity, and it will be able to return the correct results (i.e. first call result, second call result, etc)

maxim · September 21, 2021, 9:24pm

If the workflow worker dies midway the workflow state including all local variables will be restored. So no need to handle workflow worker failures. If an activity worker fails then the activity is going to timeout (probably due to missing heartbeat) and will be retried.
Yes, a running activity takes resources at the worker. If you are using Go SDK I wouldn’t worry about this as goroutines are cheap. For Java, I would recommend using an async activity implementation to not take a thread with the sleep.
Workflows don’t crash unless your workflow code throws an exception. The workflow worker crash is handled seamlessly as described in (1). So I do confirm that all the intermediate activity results will be preserved.

Jamie_Sebastian · September 22, 2021, 4:28pm

Awesome, thanks!

As a followup, if I want to execute an activity for every 24 hours that there is no state transition, what approach would you recommend?

I was looking at the Go timer and updatable timer samples, but I’m not sure that those work for my use case, in particular because the timer sample only triggers once (after d time), and the updatable timer is a “sleep until” instead of “how much longer”. Is there a way to reset the timer from the first callback (i.e. after 24 hours, and timer callback starts, can I reset this timer to again have 24 hours?)

maxim · September 22, 2021, 5:19pm

You want something like updatable timer, but instead of “sleep until” pass the “how much longer” value to the call.

Topic		Replies	Views
What is the best practice for a polling activity? Community Support go-sdk , activity , polling	20	16856	November 25, 2024
Re-execute activity till specific status is reached Community Support	2	351	September 21, 2023
Using activities to poll for status Community Support go-sdk , activity , polling	1	493	March 18, 2024
Activity retries without exception Community Support java-sdk , activity , best-practices	12	3681	August 16, 2023
Managing long running, polling, transactions Community Support general-impl	3	861	April 28, 2024

How do retries/crashes affect a long running polling activity, and a workflow?

Related topics