Optimizing Workflow Archival

Use Case

  1. My Application does create, update and delete of (say Cars). A user can create a car once, update the car multiple times and delete the car once.
  2. We create a new workflow instance for every create car request.
  3. When an update or delete car is requested for a car, we create a new workflow instance but use the same wfid as that was used for create car.
  4. Right before update/delete workflow instance is triggered, we invoke a query method on the previous workflow instance of the same wfid and extract some data (using temporal service lib) and feed it as input to the workflow to be triggered.
  5. We don’t know the duration between create car and update/delete car. After a car is created, some customers may want their car to be updated in a couple of days, some want it after a couple of weeks and some never.
  6. With these things in mind, we want to archive our workflows.

Questions

  1. How should the archival frequency be decided for the above usecase? What are the side effects of a very short archival frequency (say every day)?
  2. Is there a difference in the way query methods are invoked for an archieved workflow (code wise)? or what ever the code currently used to query a non archieved workflow, same can be used?
  3. What could be the order of latency introduced for
    • For invoking a query method on archieved workflow
    • For creating a new instance of a workflow using the wfid of an archived workflow. Will there be any?
      Or does it depened on the underlying infrastructure that holds the data?
  4. Is there something like unarchiving?

Wondering why you creating a new workflow instance each time and not keeping it alive and sending signals to it when updates happen? While a long-running workflow is waiting on event it should not be utilizing your worker resources. Using archived workflows just seems a little bit backwards here, but could be wrong.

Our Design
We have a main workflow and a couple of child workflows. And these child workflows will invoke the activities.

  1. A car can last for say 10 to 15 years

  2. A car can be updated several time during its limetimes. Every update can add 20 to 30 events to the main workflow history.

Quesitons

  • Can I archieve only the childworkflows and not the main workflow? As I don’t have anything usefull in the childworkflows and the query method I invoke is on the main workflow.

  • Is it ok to run a workflow for as long as 10 to 15 years? Does the adding new events to the history year after year cause a bottlenect?

  • "The purpose of Archival is to keep Event Histories as long as needed while not overwhelming the persistence store." Got this from Temporal Documentation. How does a long long running workflow play with this fact?

You have two options:

  1. Use Temporal workflow as the source of truth for car information. In this case, the car workflow will have to be open for the lifetime of the car. To keep history size bounded and simplify versioning you can call continue as new periodically.
  2. Use an external database to store car information between updates. Use SignalWithStart to notify car workflow about updates starting it if it is not yet running. If a workflow is not running it can use an activity to load the current car state from the external DB and then process the received signal. If it is already running SignalWithStart will just send a signal to a workflow.

Ok Thanks Maxim and Tihomir.

Say I want to take the archival route, want to undertstand the upside/downside of that. Could you please answer the original four questions?

  • Can I archieve only the childworkflows and not the main workflow? As I don’t have anything usefull in the childworkflows and the query method I invoke is on the main workflow.

Archival is applied to all workflows after they are closed. It doesn’t discriminate between parents/children/types etc.

Is it ok to run a workflow for as long as 10 to 15 years? Does the adding new events to the history year after year cause a bottleneck?

There is a history limit size. I believe default is 50k events. So if your workflow doesn’t exceed this limit then it can run as long as needed. Any workflow can call continue as new to reset history back to 0 size.

  • “The purpose of Archival is to keep Event Histories as long as needed while not overwhelming the persistence store.” Got this from Temporal Documentation. How does a long long running workflow play with this fact?

Archival applies only to closed workflows. So long-running workflows are not related to archival in any way.

  1. How should the archival frequency be decided for the above usecase? What are the side effects of a very short archival frequency (say every day)?

There is no such concept as frequency in Temporal archival. Archival happens only for closed workflows after a retention period configured for the namespace.

  1. Is there a difference in the way query methods are invoked for an archieved workflow (code wise)? or what ever the code currently used to query a non archieved workflow, same can be used?

Currently querying archived workflows is not supported.

  1. What could be the order of latency introduced for
  • For invoking a query method on archieved workflow

Not supported

  • For creating a new instance of a workflow using the wfid of an archived workflow. Will there be any?

Workflow id uniquencess only applies to closed workflows before their retention period expiration. Archived workflows do not participate in uniqueness check.

  1. Is there something like unarchiving?

Not currently. We plan to support workflow reset for archived workflows in the future.

Say I set the retention period to 1 day, is it too aggresive? How should we arrive at the ideal retention period? If our end goal is performance, lower the retention period better is the performance? How much is too low and ends up as an anti pattern?

This is just for my understanding,

  1. Say reset feature is implemented in future, How do we go from archived workflow to unarchieved workflow at a high level? The encoded event history of the workflow instance needs to be decoded and inserted back to the database.?

Say I set the retention period to 1 day, is it too aggresive? How should we arrive at the ideal retention period? If our end goal is performance, lower the retention period better is the performance? How much is too low and ends up as an anti pattern?

I think it depends on your application (num of workflows etc). With long retention period you can possibly run into DB disk space issues as it would keep all completed workflows for a long time. This then could affect performance as well. The minimum value you can set for retention period is 1 day.

  1. Say reset feature is implemented in future, How do we go from archived workflow to unarchieved workflow at a high level? The encoded event history of the workflow instance needs to be decoded and inserted back to the database.?

When archival is enabled, the workflow execution history, as well as visibility records of completed workflow executions are stored in the archive. This for example allows to search archival without ES enabled. Since the exec history is stored, I would assume reset would work as in creating a new workflow execution, replaying the stored history up to the point where you define the reset, and continuing.

1 Like
  1. Will search attributes continue to stay in the archived data?

  2. Will input and output of activites/workflows be available in the archived data?

  3. How does the below command work internally? Is there a temporal API to do the same in Java?

You can retrieve archived Event Histories by copying the `workflowId` and `runId` of the completed Workflow from the log output and running:

./temporal --ns samples-namespace wf show --wid <workflowId> --rid <runId>

I exported the workflow execution history of a workflow and I found that the data fields were in base64 encoded format.

  1. Can I safely assume what ever I got by exporting the workflow history, I’ll get the same in archival data?
    Is Archival Data == exported workflow execution?
  1. Note in this page (What is a Temporal Cluster? | Temporal Documentation) says archiving visibility records is not supported?

You mean as in feature that continue to be available in the future versions?
Archival feature is very experimental atm, and we are working on updating it and making sure it is robust and production ready. Through that work there might be some changes that might affect your use of it now, so relying on archival currently for mission-critical app work might not be the best idea. Best to wait for a bit once all the updates and fixes+features have been added.

  1. Inputs and results of workflow and activities are part of the workflow history, so they should be yes.

  2. Will check on that and get back to you.

  3. Yes it should be the same history as if you exported it from tctl via for example:
    tctl wf show -w <workflow_id> -r <workflow_run_id> --output_filename abc.json

  4. Alongside the updates we are working on for archival we will keep the docs updated as well. It could be that the docs are outdated for now and will look into it. Search attributes are indeed archived as well.

1 Like

Ok Thanks.

  1. Also, will archival create a new file for each workflow that gets archived? Or all the workflows archival data will be stored in One file?
  1. How does this work internally? Is there a batch job that archives all the closed workflows past retention period at once or archival happens at individual workflow level?

  2. As part of archival process, do we delete search attributes from Elastic Search?

  3. Is it possible to point archival data to two different s3 buckets based on namespaces? Ex - Archival Data of Namespace 1 goes to Bucket 1 and that of Namespace 2 goes to Bucket 2 ? Or should I write custom provider to achieve the same?

This is a little bit incorrect, as its not "./temporal" but tctl command.
So for example:

tctl --ns default wf show --wid MyWorkflowType --rid myworkflowrunid

Its the same tctl command to show workflows. It does not however look at archived workflows.
For that you can use for example:


tctl --ns default wf listarchived -q 'WorkflowType="MyWorkflowType"'

where “-q” is a visibility query.

(note: updated)

To add to last response, for Java SDK you can use ListArchivedWorkflowExecutionsRequest to get that info as well.

Adding @Vitaly as he is working on the archival updates and can provide more info as well.

(Updated Questions List)

  1. Will archival create a new file for each workflow that gets archived? or One file stores workflow data of all the archived worklfows?

  2. Is there a java API to get archived workflow’s execution history? Tihomir shared this with me ListArchivedWorkflowExecutionsRequest. Want to if you can share the method call to which I need to pass the request?

3.How does the Java API or the ttcl command to fetch archival data work internally? How does it know how to fetch the execution history for a given wfid and runid? Does Temporal store a mapping between wfid, runid and the archived data?

  1. Say I’m writing a custom provider, is that provider only responsible for storing archived data or also assisting temporal for retrieving the archived data? Because If I’m writing a custom provider, how will temporal fullfill the API calls to fetch execution history of Archived workflow unless I tell temporal how to retrieve the archived data in my provider?

  2. Is Archival a batch job that archives workflow once a day or does the archival happen at a each workflow level? Want to understand how workflow archival is triggered.

  3. During archival will the search attributes be deleted from Elastic search?

  4. Can I point Archival to 2 different S3 buckets based on Namespace? Ex - Archival Data of Namespace 1 goes to Bucket 1 and that of Namespace 2 goes to Bucket 2 ? Or I need to a write custom provider to achieve the same?

  5. I’m running temporal on docker, does archival work with this setup?

        ListArchivedWorkflowExecutionsRequest listArchivedWorkflowExecutionRequest =
                ListArchivedWorkflowExecutionsRequest.newBuilder()
                        .setNamespace(client.getOptions().getNamespace())
                        .setQuery(query)
                        .build();
                ListArchivedWorkflowExecutionsResponse listArchivedWorkflowExecutionsResponse =
                service.blockingStub().listArchivedWorkflowExecutions(listArchivedWorkflowExecutionRequest);
                for(WorkflowExecutionInfo workflowExecutionInfo : listArchivedWorkflowExecutionsResponse.getExecutionsList()) {
                // ...
                }

Vikas, note that archival is an experimental feature in temporal and will undergo significant changes in the future. As Maxim said above it’s simply a way to put closed workflows from your main database into a cold storage. Primary use case for the archival right now is compliance when you want to dump data into something like S3 and keep it there for a long time for all completed workflows. There is no functionality that would allow smooth transition from archive back into hot storage (e.g. using query or reset).
I’m trying to understand what is your main concern with things sitting in the hot store? Is it reliability, is it size, or is it something else? Also it’s not clear how you arrive to the conclusion that you need specifically archival from what you’ve described above.

Sorry I didn’t mean we want to archive workflows because of those factors. I meant we want to be able to do those things post archival as those are our core use cases.

Our concern is performance. We want to archive workflows to keep temporal clean and fast. But the roadblock we have hit is, there’s data in those workflows stored as objects (accessed via query method) and search attributes. Post archival we don’t know how to access them.

The source of all these problems is that our application design doesn’t have a database layer. We are using the objects stored in Workflow and the query methods as our database layer.
So we have to follow the workflow where ever it goes and come up with a solution to access our data. In this case, we are trying to find a solution to access data from archived workflows.

Our strategic solution would be to use a proper database. But temporarily We want to find a way to tap into the information stored in Archived workflows.

If I can get hold of workflow execution history, I know how to use it to extract the required information from it.
Through the code snippet posted by Tihomir, I will be able to do that.

The question I have is, say I implement a custom provider to store the data, Will I still be able to use the same Java API to access the workflow execution history?

Another question is, say we use elastic search to store our data using search attributes, will they still persist post archival in elastic search or will temporal delete them in ES?

There are only 2 reasons why your workflows might be getting slower over time:

  1. Your storage is insufficient - most options like cassandra scale pretty well for most reasonable use cases, you should work with your DBAs to make sure that your needs are satisfied.
  2. Your workflow histories are too big - in this case you should use “continue as new” feature that allows to copy workflow state into a new run starting from a fresh history.

I believe this depends on your archival configuration in dynamicconfig. See https://github.com/temporalio/temporal/blob/master/config/development.yaml#L74 and https://github.com/temporalio/temporal/blob/master/config/development.yaml#L92 which you can overwrite in your dynamicconfig to set to what you want.