Questions on archival feature as of 0.30

We are starting to investigate the use of the archival feature. In our setup, we are using Cassandra for history db, ElasticSearch for visibility, Cassandra visibility will be turned off and we will be using an S3 compliant object store for archiving.

As of 0.30, the docs are a bit unclear on how archival and visibility interact. We have a few questions:

  1. When using ES, does archiving of workflows also delete related data from ElasticSearch?

  2. When using ES, how do we search for archived workflows? Does “tctl workflow listarchived” work in this scenario? Or is it perhaps the case that ES data is not deleted and so “tctl workflow list” continues to work even after workflows have been archived?

  3. The docs imply that visibility archiving is not really supported yet (as of 0.30). I take that to mean we should disable it in our config, with something like below. Is that correct?

    archival:
      visibility:
        state: "disabled"
        enableRead: false
  1. Does all archiving activity happen in the worker service? (I.e. which service(s) need to have the S3 secret access key available?)

  2. When archival is turned on, do archiving and enforcement of retention period happen at the same time - i.e. workflow data is archived and then immediately deleted?

  3. Suppose we decide not to use the archival feature and instead have a long retention period. Aside from the database space implications, what are the performance implications? Will the increased steady state size of the history database due to longer retention have an impact on performance of running workflows or workflow search?

Thanks!

When using ES, does archiving of workflows also delete related data from ElasticSearch?

Once workflow completes it’s state is represented within Temporal using 3 pieces of data.

  1. MutableState: Relevant part of workflow execution and has some critical information which is required for loading the entire history.
  2. History: The entire workflow execution history of events
  3. Visibility Record: Elastic search document which allows us to serve visibility queries.

All of these three pieces of information is deleted after the retention interval based on namespace config.

When using ES, how do we search for archived workflows? Does “tctl workflow listarchived” work in this scenario? Or is it perhaps the case that ES data is not deleted and so “tctl workflow list” continues to work even after workflows have been archived?

ES should not be used for searching through archived workflows. If archival is turned on we not only archive the workflow execution history, but visibility record is also include with couple of indexes like workflowID, workflowType, startTime and closeTime. This allows to search archival without ES.

The docs imply that visibility archiving is not really supported yet (as of 0.30). I take that to mean we should disable it in our config, with something like below. Is that correct?

Archival is still experimental feature as we have not done much testing on it yet. We plan to work on hardening archival after V1 release.

Does all archiving activity happen in the worker service? (I.e. which service(s) need to have the S3 secret access key available?)

Worker service is running some System Workflows which is responsible for actually archiving the workflow execution. But retention timers are processed in history and if workflow is small enough it is archived inline instead of sending the signal to archival system workflow.

When archival is turned on, do archiving and enforcement of retention period happen at the same time - i.e. workflow data is archived and then immediately deleted?

That is correct. We guarantee that workflow execution should not be deleted without archiving it first.

Suppose we decide not to use the archival feature and instead have a long retention period. Aside from the database space implications, what are the performance implications? Will the increased steady state size of the history database due to longer retention have an impact on performance of running workflows or workflow search?

I would not recommend having very long retention period. There are performance implications if the size of Cassandra cluster becomes too large. For instance compaction could get very expensive.

Thanks, Samar!

ES should not be used for searching through archived workflows. If archival is turned on we not only archive the workflow execution history, but visibility record is also include with couple of indexes like workflowID, workflowType, startTime and closeTime. This allows to search archival without ES.

Just to clarify, does this mean that when workflows are archived, records are created in the Cassandra visibility schema - even though we are using ES instead of Cassandra for non-archived visibility? Do the same scalability concerns around using Cassandra for visibility apply here?

Visibility indexes are also archived to the same store used for archival of history events. These are closed workflow executions so visibility records are immutable at this point and can easily be uploaded to any blob store. In the case of S3 they are uploaded as zero byte blobs with indexes as metadata on them.