Error while fetching cluster metadata

fivos · August 25, 2022, 3:10am

We are trying to deploy Temporal using the helm chart, using a MySQL as its persistence store. We’ve successfully deployed the helm chart both in development and on an AWS cluster but on an Azure cluster we are seeing the following error, on all services, leading all pods to go into a crashloopbackoff loop.

{"level":"warn","ts":"2022-08-22T23:36:08.421Z","msg":"Failed to save cluster metadata.","component":"metadata-initializer","error":"proto: ClusterMetadata: wiretype end group for non-group","cluster-name":"active","logging-call-at":"fx.go:628"}

Unable to start server. Error: could not build arguments for function "go.temporal.io/server/temporal".ServerLifetimeHooks (/home/builder/temporal/temporal/fx.go:738): failed to build temporal.Server: could not build arguments for function "go.temporal.io/server/temporal".glob..func1 (/home/builder/temporal/temporal/server_impl.go:65): failed to build *temporal.ServerImpl: could not build arguments for function "go.temporal.io/server/temporal".NewServerFxImpl (/home/builder/temporal/temporal/server_impl.go:69): could not build value group *temporal.ServicesMetadata[group="services"]: could not build arguments for function "go.temporal.io/server/temporal".HistoryServiceProvider (/home/builder/temporal/temporal/fx.go:334): failed to build config.Persistence: received non-nil error from function "go.temporal.io/server/temporal".ApplyClusterMetadataConfigProvider (/home/builder/temporal/temporal/fx.go:563): error while fetching cluster metadata: proto: ClusterMetadata: wiretype end group for non-group

Given that the same helm chart works on one kubernetes environment and not another, I’m guessing there must be some configuration issue though it’s hard to tell from the error what configuration that might be.

tihomir · August 26, 2022, 8:04pm

Are you deploying the same server version and if so which one is it?
With 1.14 server release cluster metadata was moved and loaded from dynamic config rather than static config, wondering if that could be the case here.

fivos · August 26, 2022, 11:23pm

Yes, the same server is deployed on all environments. We are using the latest helm chart, 1.17.4.

Can you share some details on how the dynamic config for cluster metadata works? We are not using multi-cluster replication. I see that there’s a cluster_metadata table but it’s empty on all of our environments.

fivos · September 23, 2022, 4:55am

I spent a bit more time debugging this issue and managed to reproduce it in a local environment by doing the following:

Exported the temporal and temporal_visibility DBs from the problematic cluster and imported them to a local DB instance.
Deployed the same Temporal helm chart to a local kube cluster and pointed that to the copied databases.

My understanding from the error is that cluster metadata cannot be unmarshaled properly and is possibly corrupted. I was able to resolve the issue in my local environment by deleting the single row in temporal.cluster_metadata_info . The row got recreated by the application and the server proceeded to start normally. I tried doing the same on the Azure cluster but the issue persisted. The row got recreated but that data still seems to be corrupted.

Any ideas on what might be causing the cluster metadata to be corrupted?

Topic		Replies	Views
Data serialization error for Temporal on Azure Test Community Support	1	445	February 7, 2023
Multiple cluster metadata does not work in version 1.14.0 Community Support xdc	1	739	January 17, 2022
Segmentation violation during temporal upgrades Community Support	11	1133	April 13, 2021
Multi Cluster metadata Community Support multicluster	7	688	September 22, 2021
Multi-Cluster Replication config changes corrupting database Community Support mysql , xdc	12	2163	January 17, 2022

Error while fetching cluster metadata

Related topics