Xdc replication issue after the cluster rebuild

Hi,

While we were executing the DR exercise we have encountered the XDC issue which we can’t recover from except for rebuilding the cluster pair.

Our current setup is 2 clusters (primary:failoverVersion:1, secondary:failoverVersion:2) with XDC configured and operational.

DR scenario is following:

  1. Delete secondary cluster
  2. Create secondary cluster
  3. Configure the XDC to replicate to primary
  4. Failover the namespace from primary to secondary
  5. Send a dummy notification to the running workflows
  6. Make sure all is moved to the secondary cluster

All is fine a that stage with namespaces making the way on secondary and workflows replicated and we proceed to

  1. Delete the primary cluster
  2. Create the primary cluster
  3. Configure the XDC to replicate to secondary

At that stage we normally submit a test creating a global namespace on primary. Namespace gets created but it is not replicated back to secondary. Same happens if we create ns on secondary, it does not replicate to primary.

Below operations also do not have any effect

  1. Failover the namespace from primary to secondary
  2. Send a dummy notification to the running workflows
  3. Make sure all is moved to the secondary cluster

At that stage we have re-created both clusters and XDC works as expected with NS failing over and WF events replicated. When we were issuing the failover command namespace version increment was increasing as we would expect.

Could someone please help us to understand what might have caused the XDC to fail (no server error logs indicating the failure)?

We are suspecting something glitched with the replication version being discarded by the servers up on arrival of the replicated event but this is only a guess. We can also re-run the exercise if additional info is needed.

Temporal Version: 1.16.2

We will attempt another test with 1.17.1 as there were few fixes applied between 1.16 and 1.17.1 related to the replication

1 Like

We have got a successful DR drill executed with the 1.17.1 and XDC worked as expected.