How do retries/crashes affect a long running polling activity, and a workflow?

Hi, I am new to Temporal and I have a few questions about a scenario about long-running polling for a state, and for actions upon state transitions. After some research, I believe the following will work but would like some confirmation:

Workflow will have a local variable cur_state storing the current state. Then, while cur_state is not the end state, do the following: run a long-running polling activity (which uses heartbeats), passing in cur_state as the param. This activity will only complete when the polled state differs from the cur_state passed as a parameter. Now back to the workflow: the activity will return the polled_state, and this means there is a state transition from cur_state to polled_state, and thus the relevant action can be done. Then carry on with the while loop.

Questions:

  1. If the activity worker dies midway, the value of parameter cur_state will be restored?

  2. If we have very many workflows running at the same time, there will be many of these activities happening, when an activity sleeps (in between polling), will it be taking up the worker usage? That is, will activities be able to ’yield’ worker usage when they sleep? In general, should the problem of ‘not enough workers’ be something to be worried about?

  3. When workflow crashes, from what I researched, it seems that it will restart the whole workflow but since each activity result is stored, it will effectively skip through the activity and just return the result immediately to the flow. I want to confirm that the activity result stored is not just the latest call to the activity, but for each call previously made to the activity, and it will be able to return the correct results (i.e. first call result, second call result, etc)

  1. If the workflow worker dies midway the workflow state including all local variables will be restored. So no need to handle workflow worker failures. If an activity worker fails then the activity is going to timeout (probably due to missing heartbeat) and will be retried.

  2. Yes, a running activity takes resources at the worker. If you are using Go SDK I wouldn’t worry about this as goroutines are cheap. For Java, I would recommend using an async activity implementation to not take a thread with the sleep.

  3. Workflows don’t crash unless your workflow code throws an exception. The workflow worker crash is handled seamlessly as described in (1). So I do confirm that all the intermediate activity results will be preserved.

3 Likes

Awesome, thanks!

As a followup, if I want to execute an activity for every 24 hours that there is no state transition, what approach would you recommend?

I was looking at the Go timer and updatable timer samples, but I’m not sure that those work for my use case, in particular because the timer sample only triggers once (after d time), and the updatable timer is a “sleep until” instead of “how much longer”. Is there a way to reset the timer from the first callback (i.e. after 24 hours, and timer callback starts, can I reset this timer to again have 24 hours?)

You want something like updatable timer, but instead of “sleep until” pass the “how much longer” value to the call.

1 Like