finally i was able to create an example xdc setup with in single docker compose (where two temporal server/clusers runs (temporal-active and temporal-standby) and share the same network(temporal-network) so that replication can happen between the two clusters.
In this example one can web the temporal web on port 8088 for the active cluster( temporal-active-web) and on port 8099 one can track the passive cluster ( temporal-standalone-web)
I was able to set up xdc, register a global namespace, with this setup.
I logged into standby web console and was able to view the namespaces and workflows.
So far so good…
Now to simulate cluster failure,
I explcitly brought down temporal-active server and temporal-active-mysql ( regional failure)
logged on the temporal-standby-admin-console and performed a fail over with
tctl --ns xdc n up --ac standby
The fail over command was successful in tctl also the web, http://localhost:8009/namespaces/xdc/settings i saw the namespaces got swapped (stand by was made active)
i verified that http:///localhost:8008 is not accessible (as i have already brought down temporal-active)
However, even after the switch, my standby workers donot seem to kick in. no actvity/worfkow tasks are being dispatched to the standby worker connected to temporal-standby-server .
I also see this error in the standby temporl-standby- server’s log after fail over
temporal-standby | {"level":"error","ts":"2021-08-23T16:57:55.228Z","msg":"Failed to get replication tasks","service":"history","error":"last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.0.4:7233: connect: no route to host\"; last resolver error: dns: A record lookup error: lookup temporal-active on 127.0.0.11:53: read udp 127.0.0.1:34603->127.0.0.11:53: i/o timeout","logging-call-at":"replicationTaskFetcher.go:395","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/history.(*replicationTaskFetcherWorker).getMessages\n\t/temporal/service/history/replicationTaskFetcher.go:395\ngo.temporal.io/server/service/history.(*replicationTaskFetcherWorker).fetchTasks\n\t/temporal/service/history/replicationTaskFetcher.go:338"}
temporal-standby | {"level":"error","ts":"2021-08-23T16:57:55.271Z","msg":"Failed to get replication tasks","service":"worker","component":"replicator","component":"replication-task-processor","xdc-source-cluster":"active","error":"last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.0.4:7233: connect: no route to host\"; last resolver error: dns: A record lookup error: lookup temporal-active on 127.0.0.11:53: read udp 127.0.0.1:37973->127.0.0.11:53: i/o timeout","logging-call-at":"namespace_replication_message_processor.go:157","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/worker/replicator.(*namespaceReplicationMessageProcessor).getAndHandleNamespaceReplicationTasks\n\t/temporal/service/worker/replicator/namespace_replication_message_processor.go:157\ngo.temporal.io/server/service/worker/replicator.(*namespaceReplicationMessageProcessor).processorLoop\n\t/temporal/service/worker/replicator/namespace_replication_message_processor.go:121"}