Design of system with 100s of concurrent ETL processes

Ioannis_T · May 2, 2023, 9:17am

We would like to use Temporal to orchestrate a large number of ETL transformations of blockchain data. We are a small team, and we want to be able to process 100+ chains, and 10-50 ETLs per chain. Our previous setup was based on a dyanmodb master table → event stream → lambda → dynamodb target tables, however, managing such a large number of services has become a pain. Other issues include losing events after 24 hours, and not being able to replay all events out of the box for new ETLs.

It looks like temporal can solve all of that, but we are not confident on the design we should follow. Here is what we have right now:

One eternal temporal workflow per chain implemented in Go
The workflow calls and awaits on an activity that syncs block-batches to master-table. The activity reports back the last block successfully synced or an error. On success, we call the activity again with the next batch. After a few days, the workflow will catch up with the chain height which grows at 1 block per 12 seconds.
We need to process all the blocks added to the master-table through ETL activities. An ETL activity will read data from the master-table, enrich it, transform it, and save it to the target tables (idempotent). We don’t see the point returning the data back to the workflow to decouple transformation from writing. 99% of the time it will be the transform that fails, and if a write fails, we are ok to redo the transform.
Every ETL job has its own block height (up to what block# this ETL job has processed blocks without error). ETLs are independent and new ETLs may be added or removed in the future through signals.
Every few seconds, we loop through every ETL spec
- if there is a pending ETL activity for that spec, we ignore/continue. This means that a workflow.goroutine is working on this ETL right now (heartbeating, not timed-out); else
- else, we check if the current synced block height is > the ETL block height; if it is, we start an activity in a goroutine for the ETL to catch up

The design also includes the following signals/queries

Every N blocks, save the ETLS block heights to checkpoints table, so we can inform users if a certain chart is up-to-date or not
Every M blocks, flush signals and continue-as-new to clean history
On signal, update an ETL’s higher block to force reprocessing blocks in the next run (all writes are idempotent)
On signal, add new ETL spec, or remove ETL spec
On workflow init, load ETL block heights from checkpoints table; to allow for workflow termination and restart

loren · May 8, 2023, 3:56pm

Workflow design looks good to me, and it’s a good use case! Sounds like as many as 100+ * 50 activities every few seconds—here’s some info on Worker tuning, to make sure you don’t fall behind: Developer's guide - Worker performance | Temporal Documentation

Ioannis_T · May 8, 2023, 5:52pm

Thank you so much! Off to the races then

Topic		Replies	Views
Continue As New has no chance to run in a continuously streaming signal workflow Community Support go-sdk , continue-as-new	15	3542	August 1, 2022
Validation of app architecture using temporal Community Support go-sdk	10	1458	October 18, 2021
Best way to streaming data between activities in Temporal Community Support activity , typescript-sdk	6	2448	August 2, 2024
Temporal for real-time event-based DAG workflows Community Support general-impl	1	480	October 31, 2025
Scale Temporal workflow Community Support	11	250	January 5, 2026

Design of system with 100s of concurrent ETL processes

Related topics