Batch-Jobs in Cadence

With respect to https://cadenceworkflow.io/docs/use-cases/batch-job/, do we have an implementation which showcases batch-job with cadence?

My use-case includes, for a given input, pull data rows thru api call, build new data points and write it to persistent store.

Unfortunately we don’t have a sample that demonstrates a batch-job yet.

The design heavily depends on a use case. The following data is needed to get a better answer:

  • Is API call result paginated or streamed?
  • What is the maximum number of rows in the result?
  • What is the maximum size of the result data?
  • How much processing each row requires?
  • Does a row processing requires external API calls?
  • What is the longest time a single row processing can take?
  • Is it OK to block processing if some row cannot be processed due to downstream dependency failures?
  • How big is the output?
  • Is API call result paginated or streamed?
    paginated
  • What is the maximum number of rows in the result?
    20
  • What is the maximum size of the result data?
    around 2mb
  • How much processing each row requires?
    data manipultaion or basic ruleengine
  • Does a row processing requires external API calls?
    yes
  • What is the longest time a single row processing can take?
    around 5 seconds
  • Is it OK to block processing if some row cannot be processed due to downstream dependency failures?
    Yes
  • How big is the output?
    Out put would be less than the data requested

Having these requirements I would just implement all the processing in an activity. This activity would:

  1. Paginate through results
  2. Heartbeat the progress (page token) back to Temporal
  3. In case of failure restart execution from the last heartbeated page token
  4. Process results locally, possibly using parallel threads
  5. In case of failure keep retrying locally

The workflow would invoke this activity with an appropriate retry policy to ensure that it is retried if its worker dies.