What is the best practice of performing multiple periodic pollings in parallel

wshv · October 18, 2023, 2:09am

Context:
The workflow I’m working on involves the creation of multiple copies of the same resource. Once a resource is created, polling is needed to wait for the readiness of the resource before performing the next action. The polling can take a long time, like an hour or so.

Currently I’m composing the workflow in such a way

# Create resource, say total 5 copies
# resource creation does not take lots of time
# so it is ok to create them one by one 
resources = []
for i in range(5):
   rsc_handle = await workflow.execute_activity(create_resource, payload,
                                                        start_to_close_timeout=timedelta(seconds=10))
   resources.append(rsc_handle)

# start polling for each resource
# ideally, the next action should be performed on a resource
# as soon as it becomes ready
for rsc_handle in resources:
      # polling for a resource
      await workflow.execute_activity(polling, rsc_handle,
                                            start_to_close_timeout=timedelta(seconds=5), 
                                            retry_policy=RetryPolicy(
                                                backoff_coefficient=1.5,
                                                maximum_attempts=10,
                                                initial_interval=timedelta(minutes=10),
                                         ), )
       # once the target is ready, proceed with the next set of actions...
       await workflow.execute_activity(next_action,
                                                        start_to_close_timeout=timedelta(seconds=10))

Problems
I’m relying on the temporal engine to retry the periodic polling for me. That is, in the activity which performs polling, if the resource is found as not ready, an exception is simply thrown so that temporal will later retry this polling activity automatically

In this case, I observe that the temporal engine won’t start to poll for the second resource until the first resource is ready, so on and so forth.

The time of polling for a resource varies greatly. It can take as short as 30 minutes, and it can also take as long as about 2 hours. In the unfortunate scenario, the very first resource to poll for could be the last one to become ready. I want to proceed to the next step on a resource as soon as it becomes ready. Resource that is the slowest to become ready should not block other resources from proceeding to the next action.

I could rewrite the logic of polling activity from

if resource is found to be not ready:
    # leave it to temporal for scheduling the next polling 
    raise NotReady()

to

for _ in range(50):
   if not ready:
      asyncio.sleep(...)

but this way I have to implement the retry logics by myself, which I don’t want (I want to leave it to temporal. Also, I have to change the code which launches the polling from

for rsc_handle in resources:
     # polling for a resource
     await workflow.execute_activity(polling, rsc_handle, ...)

to

for rsc_handle in resources:
     # polling for a resource
     asyncio.create_task(workflow.execute_activity(polling, rsc_handle, ...))

Please advice what is the best practices for handling my situation. Thank you

Chad_Retz · October 18, 2023, 12:08pm

You are intentionally only calling one activity at a time and waiting. You should use normal Python asyncio primitives if you need do things concurrently. For example, instead of await workflow.execute_activity, you can just call it without await and put the awaitable on a list, then await the result of asyncio.wait with that list and return_when=FIRST_COMPLETED to get the first completed.

wshv · October 19, 2023, 6:26pm

@Chad_Retz The way that I currently implement the periodic polling is to simply throw an exception from the activity function and let the temporal server automatically retry it for me. This works well if there is only one polling activity at a time. But what if multiple polling activities are running in parallel?

Put it another way, let’s say I have 5 polling activities that are running in parallel, and each of them relies on throw exception to inform temporal retry is needed. I just don’t want temporal to busy-retrying the first activity that throws, while starving other activities that later throws

Chad_Retz · October 19, 2023, 6:41pm

You can cancel the other activities at any time. So if one returns and you want to cancel the others that are in a poll-via-error-based-retry loop, feel free to call cancel() on the activity task. Just make sure that your activities respect cancellation by properly heartbeating (if they are in the middle of retry backoff, cancellation will automatically work). With the default activity cancellation type (TRY_CANCEL), the cancellation will be sent to the activities in the background but the activity asyncio tasks on workflow side will be marked cancelled immediately.

wshv · October 20, 2023, 5:45pm

Thank you, I now get your point.

Topic		Replies	Views
Continue-as-new overhead, suitability of using Temporal Community Support	1	366	March 5, 2024
Infinite loop inside a workflow Community Support python-sdk , child-workflow , workflow-implementat	5	2329	July 7, 2023
Repeat an activity with time limit Community Support java-sdk	5	1249	January 27, 2022
Managing long running, polling, transactions Community Support general-impl	3	916	April 28, 2024
What is the best practice for a polling activity? Community Support go-sdk , activity , polling	20	17204	November 25, 2024

What is the best practice of performing multiple periodic pollings in parallel

Related topics