I see my history and front end logs full of these messages
Front end ::
{"level":"warn","ts":"2022-03-10T13:55:36.891Z","msg":"Failed to get replication tasks from client","service":"frontend","error":"context deadline exceeded","logging-call-at":"client.go:916"}
History ::: {"level":"error","ts":"2022-03-10T13:58:47.971Z","msg":"Failed to retrieve replication messages.","shard-id":78,"address":"10.x.yyy.zzz:7234","component":"history-engine","error":"context deadline exceeded","logging-call-at":"historyEngine.go:3000","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history.(*historyEngineImpl).GetReplicationMessages\n\t/temporal/service/history/historyEngine.go:3000\ngo.temporal.io/server/service/history.(*Handler).GetReplicationMessages.func1\n\t/temporal/service/history/handler.go:1212"}
This is happening in temporal server 15.0.0
Please note, i recently attemped upgrading to 15.0.2 and it failed (so my schema could be in 15.0.2 where as frontend/history etc could be in 15.0.0)… can that be the issue?
Schema changes apply for minor version releases, not patch versions, so updating from 15.0.0 to 15.0.2 should not have included any schema changes.
i recently attemped upgrading to 15.0.2 and it failed
What was the failure? Checked with server team and they mentioned workflow lock contention could cause the issue with fetching replication tasks with context deadline exceeded errors.
{“level”:“fatal”,“ts”:“2022-03-07T17:30:34.201Z”,“msg”:“Invalid rpcAddress for remote cluster”,“error”:"address dns:///tmystandbycluster:7233: too many colons in address "
and the cluster fails to start up…
if i change the dns to plain ip it starts…
at times clearing the cluster_info and cluster_info_metadata table helps…
I am not able to get hold of a defenative configuration/ guide for configuring and running xdc clusters.
The documentation says that dns:/// can be used and it would be the only way to force grpc-go to use the DNS resolver, but given the current code and the use of net.SplitHostPort I don’t think this is possible.