We have temporal as an orchestration sort of platform for running workflows
that consist of many small activities (i.e. we do not have long running activities or cpu intensive ones if that matters).
In one stage of workflow, we receive a bunch of URLs that we then run a scraping activity against.
In order to speed up the execution, I used asyincio. gather
over an async function roughly, only ever doing ~20 total.
async def process_foobar(input):
...
res = await workflow.execute_activity(func, input, start_to_close, retry)
return res
I then would fan out in my workflow using asyncio.gather
.
However, I had unreliable results and have been unable to root cause or debug.
The activities never seem to get started, or timeout in something excessive of how long I have
configured them to timeout.
I have a large number of concurrent activities etc and played around with those settings however I am wondering if there is just a prescriptive method for doing this.
Is there something around the activities being scheduled but somehow not being picked up and then timing out even if they haven’t actually been allocated/executing for a time.
I am okay for resilient failures here, I even would prefer hedging as scraping can be unreliable, any insight would be helpful