Workflow orchestration for data pipeline activities

Hi,

I’m trying to implement a data pipeline workflow via Temporal Java-SDK.

Here are the sequential steps in my mind:

  1. First read the database rows from a big table(let’s say ten million rows in total) one by one via JDBC stream query.
  2. Pass each row into the next handlers, for example, Elasticsearch for indexing.

The rough idea is:

  1. create a DB-reading activity for reading the rows from the big table
  2. create any required Data-Pipeline activities for each row handler, and use each row as the activity input argument.

But when I tried to implement the activities call inside the main workflow function, I got this issue:
because the dataset is huge, iterate each row from the DB, store them into a list, and put them as the return data, return them from the DB-reading activity to the workflow is impossible, so I’m seeking a way that the DB-reading activity can send data to the workflow continuously without blocking the DB-reading process and workflow so that I can get each streaming row and pass each row to the next Data-Pipeline activities.

I tried to search such topic in this community, and found these:

  1. Reactive not supported: Reactive support within activity.

  2. Send data to the workflow via signal: Passing Activity Stub to (other) Activity Method Blows Call Stack - #3 by alec

From link 2, It seems like I can use signal to send these millions of rows to the workflow one by one. However, it was assuming that the total number of such signals is bounded.

So, my questions here are:

  1. does the DB-reading activity case is signal bounded(I guess not).
  2. If I can’t use the signal for sending data to workflow from activity, How can I achieve my use case, any suggestions?

Many thanks.

Hi found an alternative way instead of using the signals from here:

Have two workflows:

  1. the first one is doing the long polling activity from the DB
  2. start another row-based workflow inside the polling workflow in the first workflow

The cons of this approach are it’s not easy to track the association between the two workflows, but in our case, it seems like it’s not so important.