Temporal io restart problem on Kubernetes node restart

cg1972 · April 6, 2021, 10:04pm

Hi Maxim,

Just to clarify, the cassandra DB on K8’s that is used for the temporal keyspaces was not deployed using the temporal helm charts. It’s a separate deployment that temporal connects to.

Can you provide details on why the helm chart based cassandra deployment can have data losses? Although we did not use the temporal helm charts for deploying cassandra it would be handy knowing what could cause the losses so I can check our cassandra configuration for similar issues.

maxim · April 6, 2021, 11:08pm

I’m not expert in K8s. But AFAIK running production quality persistence on it is very non trivial. And I believe Helm chart is not a right technology to manage databases on K8s.

cg1972 · April 6, 2021, 11:10pm

Thanks Maxim.

Vikram_Bailur · March 25, 2022, 5:05pm

Hello @maxim I am running into the same problem. Mine is NOT a production env btw.

I have deployed to a GKE autopilot cluster using the helm chart (with small modifications by setting resource limits). The cassandra headless clusterIP & pods are running but the frontend-pod is not coming up due to the "waiting for default keyspace to become ready” message. How do i resolve this?

thanks Vikram

tihomir · March 25, 2022, 5:57pm

Hi @Vikram_Bailur
Could you confirm that your db schema is full created before Temporal server comes up?
Helm charts do run some batch jobs to set up the schema which should complete first.
The error you mentioned can happen if cassandra is up but the keyspace hasnt completed setting up.

Vikram_Bailur · March 25, 2022, 7:30pm

Hi @tihomir - thanks for your quick response - I think you might be right - i was trying to figure out what is wrong w cassandra. Since Im using GKE Autopilot, resources take some time to get provisioned and I think the batch job might have tried to run without the cassandra pods being fully up & running. This is the message on the Job pod.
*** Can't find temporaltest-cassandra.default.svc.cluster.local: No answer

The cassandra pod also has the following message
containers with unready status: [temporaltest-cassandra]

UPDATE: it finally started up and I think the job has run as well.

I did see in the docs a way to setup the cloudsql proxy in the helm chart(to use GCP Cloud Sql as the db instead of cassandra) but i don’t see specific instructions on how to install the helm chart with that (sorry i am new to helm) - do i just use a bare minimum helm chart and then run the helm install for the cloudsqlproxy ?

derek · March 25, 2022, 8:19pm

that should be fine - everything should wait and fail for a while til it comes up. if the batch job didn’t work and isn’t still trying to run, you can create another one - or just log into the admin tools pod and run the commands the batch job would have run.

if you have enough issue getting cassandra to work and you want to try cloudsql with proxy, you can deploy it to the cluster by setting a value for server.sidecarContainers - https://github.com/temporalio/helm-charts/blob/master/values.yaml#L21

this is just a list of containers that looks like any other kubernetes containers: section. anything you put in there will run in the same pod as temporal server containers.

Topic		Replies	Views
Temporal Helm Cassandra problem Community Support helm , cassandra	3	991	April 6, 2021
Temporal service fails to start in k8s: 'Failed to get current schema version from cassandra' Community Support	2	1929	October 20, 2020
Temporal pods stuck in init state when installed using helm and cassandra as database Community Support helm , cassandra	2	711	May 25, 2022
Errors while setting up Temporal on local environment using helm chart Community Support helm , cassandra	4	1082	October 8, 2021
Helm chart deployment issues Community Support	5	977	January 22, 2021

Temporal io restart problem on Kubernetes node restart

Related Topics