Fanning out many small activities in parallel issues

We have temporal as an orchestration sort of platform for running workflows
that consist of many small activities (i.e. we do not have long running activities or cpu intensive ones if that matters).

In one stage of workflow, we receive a bunch of URLs that we then run a scraping activity against.
In order to speed up the execution, I used asyincio. gather over an async function roughly, only ever doing ~20 total.

async def process_foobar(input):
    ...
   res = await workflow.execute_activity(func, input, start_to_close, retry)
   return res

I then would fan out in my workflow using asyncio.gather.

However, I had unreliable results and have been unable to root cause or debug.
The activities never seem to get started, or timeout in something excessive of how long I have
configured them to timeout.

I have a large number of concurrent activities etc and played around with those settings however I am wondering if there is just a prescriptive method for doing this.

Is there something around the activities being scheduled but somehow not being picked up and then timing out even if they haven’t actually been allocated/executing for a time.

I am okay for resilient failures here, I even would prefer hedging as scraping can be unreliable, any insight would be helpful

I suggest posting the code that isn’t working; it’s hard to guess what the problem might be.

I’ve encountered similar issues before. In my case, it was because I accidentally used blocking APIs in async functions, causing the Python event loop to get stuck.

If you create 20 activities with asyncio.gather, all of them should start almostly immediately, even if you’re only running a single worker. If some activities are not getting scheduled, there’s a good chance your event loop is stuck. Try asyncio debug mode to see if there’s any issue.