Backup - Restore with advanced visibility

We are building a backup-restore scenario for a cluster that needs to be on-premise due to regulatory requirements and are trying to get a clear statement about following (of course we can run test scenarios and discover ourselves but it’s better to have the designed feature described):

  1. Can Elastic indexes be rebuilt from scratch in case we restore only the temporal database or do we also need to backup restore Elastic? Our assumption is that we need both (based on some older answer) but maybe there is some feature to repopulate Elastic from the main database?

  2. In case we restore the main database and Elastic with a small point-in-time difference, what will happen in case we query for:
    a. A workflow that is found in Elastic but is already closed as per the main database?
    b. A workflow that is not yet in Elastic but is already running as per the main database?
    This question is relevant only in case the backup is not manually triggered and the restore happens as part of a business continuity, that means we are not able to gently switch off the cluster before performing the backup in which case of course both sides will contain the latest state that should match.

  3. Is there a way how to check/align Elastic content based on the main database after a restore (continuation of previous question)? That means it there is any way to get the latest visibility attributes from the main database and fix/update Elastic?

Thanks for the answers. I will also publish any findings once we have the backup/restore procedure designed and tested in case there is potential reused within the community.

  1. If you don’t need the previous workflows in visibility, you don’t need to backup-restore it. If you do need, you’ll need to do backup-restore of Elasticsearch. Basically, Temporal Server will work without old entries of Elasticsearch, and it won’t restore any data there from the main database.

  2. I suppose you mean a workflow in running status in Elasticsearch, but closed in the main database. The outcome depends on whether at the time of backup, Elasticsearch was updated already or not. Consider the following illustration (it should apply to both of your questions):

    t0: workflow 'foo' is running
    t1: workflow 'foo' is closed and recorded in main database
    t2: workflow 'foo' record in Elasticsearch is updated to closed
    

    If you did backup of main database and Elasticsearch after t1 and before t2, then when you restore it, the record in Elasticsearch should receive an update and have the status set to closed. However, if you do backup of main database after t2, and backup of Elasticsearch before t2, then it won’t be updated after you restore it.

  3. No, it can’t be synced manually. The illustration above describes everything that we support. If the workflow hasn’t been closed yet, any future update on the workflow will update the record in Elasticsearch, and it won’t matter that it was out of sync before. However, if the workflow has been closed like in the example above, then it will remain out of sync.

Thanks a lot for quick answer. If I understand correctly, any added event to the history will trigger update of the workflow data in Elastic. Will it update all data including any search attributes that might have been upserted in an earlier event that got it to Elastic before the disaster but after the last backup?

Specifically, given an arbitrary signal method available on the workflow, calling it after a restore (after a dirty disaster) would ensure that all metadata in Elastic are again in sync for that workflow. Correct?

What I am handling here is that both the main database and Elastic might have some non-null Recovery Point Objective and even if we accept some data loss, at least we need to have the system consistent after recovery. This can be key when searching for workflows using advanced visibility is part of the business logic (e.g. signalling running workflows given some query).

Not all added events to history will trigger an update in visibility. We only send an update to visibility if there are changes to data stored in visibility (eg: execution status, custom search attributes, memo). I’m not sure, but I don’t think a signal triggers an update to visibility.

Every update to visibility is stateless, ie., it doesn’t matter what was in Elasticsearch before, it will update the entire record and have all fields up to date.

Understood, thanks for your replies. We’re going to develop a procedure based on that.