Processing parquet file with signals - Design

Hey,

What is the max size for a signal? also what is maximal rate of incoming signals?

I have a workflow ParquetFile2Workflow that calls an activity that reads a parquet file and then the activity sends a signal back to the parent workflow ParquetFile2Workflow for each row, the workflow listens to the signals and create a workflow per row for processing it.

When running a medium on a really small size of rows in a parquet file ~100 , the mechanism works fine, but when running it on a medium size parquet file of around 1K rows, the workflow fails on deadlock. i’m considering several steps:

  1. Send signal of batched rows to lower the amount of signals. (therefore I need to understand what is the limit of the signal size)

  2. Put a sleep method in the activity that sends the signal to lower the rate of signals

What do you think about the design and about the proposed solutions?

Signal payload limit is 2mb. Rate of signals to single execution (your parent workflow) should not be too high, couple per second max if possible.

Why dont you start the executions from activity. Have activity heartbeat with heartbeat payload being row id so could resume from last recorded heartbeat if activity times out. Another approach could be that your activity returns batches of rows that you process them invoke activity again to get next batch. Dont think approach you are looking at with signals will scale as you noticed