Signal to million open workflows

Assume we have million open workflow stored in Cassandra and each workflow instances hold a domain entity which I wanted to make copy of it to another data store? How can I do this with the temporal service not affecting the production traffic processing capabilities?

Cassandra is a multi region cluster, assume, there can be a passive region for data back up. Can we leverage that region to get out all the workflow domain state saved within?

We can have the workflow upgraded with new signal method which can push the data to external data store. Should we come up with new set of workers just for one time data replication process which can handle the signals to push the workflow internal domain state to the external data store?

How can we fetch million workflows and trigger signals across those workflow in distributed fashion?

Since the domain state is stored as workflow, I’m not sure whether I can fetch only my domain object from Cassandra temporal db directly via a spark job as an option. I don’t want to go in this route and want temporal’s distributed nature itself to make the copy of the data without adding any tech stack.

How can I do this with the temporal service not affecting the production traffic processing capabilities?

Consider using advanced visibility with Elasticsearch along with custom search attributes. This way you would not query your primary db and affect performance.

1 Like

How Temporal UI without assisted elastic search able to fetch all the workflows? Can we leverage the similar approach to fetch it directly from Cassandra? I understand that select all doesn’t work in cassandra for large dataset but there are other ways of partition lookup can be done to get all in cassandra how does Temporal does this?

If we get all the IDs, can we leverage temporal itself by creating a new workflow to send the signal in distributed fashion to open workflows?

How Temporal UI without assisted elastic search able to fetch all the workflows?

Web UI uses ListWorkflowExecutions api to list all workflows. Internally this api on server side decides how to get this info, via standard or extended visibility (ES), depending on your server configuration.

If we get all the IDs, can we leverage temporal itself by creating a new workflow to send the signal in distributed fashion to open workflows?

You could use tctl batch command with a visibility query to signal multiple workflows that are returned by your set visibility query. Batch commands are executed by internal workflows (temporal worker service running on the temporal-system namespace). They can possibly affect performance of your workflow executions.

You could also implement batch signalling via SDK client apis as well if you do not want to use tctl and want to work off a pre-defined list of ids that you have.

Thanks Tihomir, looks like the batch job seems to be a right fit for my use case. I want to understand more on how the batch job internally works. Is this treated as system workflow and makes workflow client signal at the provided rate? Is this job distributed for faster processing of the ids?