I did a poc to set up a multi cluster env and i was able to do it successfully configure the same the details of what i tried are documented here → I tried this as a brand new setup.
Now, i am trying to update an existing setup ( my dev env) which has few namespaces and lots of workflows, i am encoutering issues and i think i endedup corrupting the temporal db.
Can some one let me know the exact steps to change the multi cluster replication settings?
To start with my multi cluster config was the default one like this
clusterMetadata:
enableGlobalNamespace: false
failoverVersionIncrement: 10
masterClusterName: "active"
currentClusterName: "active"
clusterInformation:
active:
enabled: true
initialFailoverVersion: 0
rpcAddress: "127.0.0.1:7233"
#replicationConsumer:
#type: kafka
I attempted to chage this to
clusterMetadata:
clusterInformation:
primary:
enabled: true
initialFailoverVersion: 1
rpcAddress: dns:///temporal-dr-primary.mydomain.com:7233
rpcName: frontend
secondary:
enabled: true
initialFailoverVersion: 2
rpcAddress: dns:///temporal-dr-secondary.mydomain.com:7233
rpcName: frontend
currentClusterName: primary
enableGlobalNamespace: true
failoverVersionIncrement: 100
masterClusterName: primary
replicationConsumer:
type: rpc
With these changes i attempted to upgrade and brought up new pods…
the new pods fails to startup and i see various differnt errors like
{"level":"info","ts":"2022-01-07T12:17:50.080Z","msg":"Updated dynamic config","logging-call-at":"file_based_client.go:142"}
Unable to start server. Error: could not build arguments for function "go.temporal.io/server/temporal".ServerLifetimeHooks (/temporal/temporal/fx.go:777): failed to build temporal.Server: could not build arguments for function "go.temporal.io/server/temporal".glob..func1 (/temporal/temporal/server_impl.go:80): failed to build *temporal.ServerImpl: could not build arguments for function "go.temporal.io/server/temporal".NewServerFxImpl (/temporal/temporal/server_impl.go:84): could not build value group *temporal.ServicesMetadata[group="services"]: could not build arguments for function "go.temporal.io/server/temporal".HistoryServiceProvider (/temporal/temporal/fx.go:251): failed to build config.Persistence: received non-nil error from function "go.temporal.io/server/temporal".ApplyClusterMetadataConfigProvider (/temporal/temporal/fx.go:532): error while backfiling cluster metadata: %!w(<nil>)
since the default cluster name was active already, i attempted to set the cluster name back to stand by and used the following configuration
clusterMetadata:
clusterInformation:
active:
enabled: true
initialFailoverVersion: 1
rpcAddress: dns:///temporal-dr-primary.mydomain.com:7233
rpcName: frontend
standby:
enabled: true
initialFailoverVersion: 2
rpcAddress: dns:///temporal-dr-secondary.mydomain.com:7233
rpcName: frontend
currentClusterName: primary
enableGlobalNamespace: true
failoverVersionIncrement: 100
masterClusterName: primary
replicationConsumer:
type: rpc
this time around i got the below error
t[07:12 pm] Ramani, Natarajan
{"level":"warn","ts":"2022-01-07T12:25:52.405Z","msg":"Failed to save cluster metadata.","component":"metadata-initializer","error":"SaveClusterMetadata encountered version mismatch, expected 0 but got 1.","cluster-name":"standby","logging-call-at":"fx.go:584"}
also after setting EnableGlobalNamespace to true i ALWAYS see these errors in the logs
{"level":"error","ts":"2022-01-07T12:25:52.408Z","msg":"Supplied configuration key/value mismatches persisted cluster metadata. Continuing with the persisted value as this value cannot be changed once initialized.","component":"metadata-initializer","key":"clusterMetadata.EnableGlobalNamespace","ignored-value":true,"value":false,"logging-call-at":"fx.go:608","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/temporal.ApplyClusterMetadataConfigProvider\n\t/temporal/temporal/fx.go:608\nreflect.Value.call\n\t/usr/local/go/src/reflect/value.go:543\nreflect.Value.Call\n\t/usr/local/go/src/reflect/value.go:339\ngo.uber.org/dig.defaultInvoker\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:439\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:912\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramObjectField.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:396\ngo.uber.org/dig.paramObject.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:323\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramGroupedSlice.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:458\ngo.uber.org/dig.paramObjectField.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:396\ngo.uber.org/dig.paramObject.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:323\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*Container).Invoke\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:587\ngo.uber.org/fx.(*App).executeInvoke\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:873\ngo.uber.org/fx.(*App).executeInvokes\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:846\ngo.uber.org/fx.New\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:594\ngo.temporal.io/server/temporal.NewServerFx\n\t/temporal/temporal/fx.go:97\ngo.temporal.io/server/temporal.NewServer\n\t/temporal/temporal/server.go:58\nmain.buildCLI.func2\n\t/temporal/cmd/server/main.go:162\ngithub.com/urfave/cli/v2.(*Command).Run\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/command.go:163\ngithub.com/urfave/cli/v2.(*App).RunContext\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:313\ngithub.com/urfave/cli/v2.(*App).Run\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:224\nmain.main\n\t/temporal/cmd/server/main.go:52\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"}
this apart I also get errors stating the cluster rpc address cannot be changed once initialized.
{"level":"error","ts":"2022-01-07T12:25:52.409Z","msg":"Supplied configuration key/value mismatches persisted cluster metadata. Continuing with the persisted value as this value cannot be changed once initialized.","component":"metadata-initializer","key":"clusterInformation.RPCAddress","ignored-value":{"Enabled":true,"InitialFailoverVersion":1,"RPCAddress":"dns:///temporal-dr-primary.mydomain.com:7233"},"value":{"Enabled":true,"InitialFailoverVersion":1,"RPCAddress":"127.0.0.1:7933"},"logging-call-at":"fx.go:702","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/temporal.loadClusterInformationFromStore\n\t/temporal/temporal/fx.go:702\ngo.temporal.io/server/temporal.ApplyClusterMetadataConfigProvider\n\t/temporal/temporal/fx.go:658\nreflect.Value.call\n\t/usr/local/go/src/reflect/value.go:543\nreflect.Value.Call\n\t/usr/local/go/src/reflect/value.go:339\ngo.uber.org/dig.defaultInvoker\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:439\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:912\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramObjectField.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:396\ngo.uber.org/dig.paramObject.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:323\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramGroupedSlice.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:458\ngo.uber.org/dig.paramObjectField.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:396\ngo.uber.org/dig.paramObject.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:323\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*Container).Invoke\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:587\ngo.uber.org/fx.(*App).executeInvoke\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:873\ngo.uber.org/fx.(*App).executeInvokes\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:846\ngo.uber.org/fx.New\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:594\ngo.temporal.io/server/temporal.NewServerFx\n\t/temporal/temporal/fx.go:97\ngo.temporal.io/server/temporal.NewServer\n\t/temporal/temporal/server.go:58\nmain.buildCLI.func2\n\t/temporal/cmd/server/main.go:162\ngithub.com/urfave/cli/v2.(*Command).Run\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/command.go:163\ngithub.com/urfave/cli/v2.(*App).RunContext\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:313\ngithub.com/urfave/cli/v2.(*App).Run\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:224\nmain.main\n\t/temporal/cmd/server/main.go:52\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"}
{"level":"error","ts":"2022-01-07T12:25:52.409Z","msg":"Supplied configuration key/value mismatches persisted cluster metadata. Continuing with the persisted value as this value cannot be changed once initialized.","component":"metadata-initializer","key":"clusterInformation","ignored-value":{"active":{"Enabled":true,"InitialFailoverVersion":1,"RPCAddress":"127.0.0.1:7933"},"standby":{"Enabled":true,"InitialFailoverVersion":2,"RPCAddress":"dns:///temporal-dr-secondary.mydomain.com:7233"}},"value":{"Enabled":true,"InitialFailoverVersion":2,"RPCAddress":"dns:///temporal-dr-secondary.mydomain.com:7233"},"logging-call-at":"fx.go:716","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/temporal.loadClusterInformationFromStore\n\t/temporal/temporal/fx.go:716\ngo.temporal.io/server/temporal.ApplyClusterMetadataConfigProvider\n\t/temporal/temporal/fx.go:658\nreflect.Value.call\n\t/usr/local/go/src/reflect/value.go:543\nreflect.Value.Call\n\t/usr/local/go/src/reflect/value.go:339\ngo.uber.org/dig.defaultInvoker\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:439\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:912\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramObjectField.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:396\ngo.uber.org/dig.paramObject.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:323\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramGroupedSlice.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:458\ngo.uber.org/dig.paramObjectField.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:396\ngo.uber.org/dig.paramObject.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:323\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*node).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:903\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:240\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/param.go:196\ngo.uber.org/dig.(*Container).Invoke\n\t/go/pkg/mod/go.uber.org/dig@v1.13.0/dig.go:587\ngo.uber.org/fx.(*App).executeInvoke\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:873\ngo.uber.org/fx.(*App).executeInvokes\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:846\ngo.uber.org/fx.New\n\t/go/pkg/mod/go.uber.org/fx@v1.14.2/app.go:594\ngo.temporal.io/server/temporal.NewServerFx\n\t/temporal/temporal/fx.go:97\ngo.temporal.io/server/temporal.NewServer\n\t/temporal/temporal/server.go:58\nmain.buildCLI.func2\n\t/temporal/cmd/server/main.go:162\ngithub.com/urfave/cli/v2.(*Command).Run\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/command.go:163\ngithub.com/urfave/cli/v2.(*App).RunContext\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:313\ngithub.com/urfave/cli/v2.(*App).Run\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:224\nmain.main\n\t/temporal/cmd/server/main.go:52\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:255"}
in temporal database cluster_metadata and cluster_metadata_info , i see my old and new cluster details,
when i do tctl adm cl d from my old pods which are still running, i see my old cluster details alone (not the secondary/standby details), how ever in database, i see entries for both.
I guess i endedup corrupting the db.
Can some one suggest a way out pl?