Hello,
I’m using temporal 1.20.3 with cql-proxy as sidecar and astraDb on an EKS cluster (1.23) in multi az.
Sometimes, not always, when I start a new pod of any temporal component, it may happen that it starts to crash loop with the following message “unable to initialize cassandra session” and after several restarts it starts normally, sometimes the restarts are in the order of hundreds.
this is the message:
“TEMPORAL_CLI_ADDRESS is not set, setting it to 100.64.12.237:7233\n”
“2023/07/07 08:49:06 Loading config; env=docker,zone=,configDir=config\n”
“2023/07/07 08:49:06 Loading config files=[config/docker.yaml]\n”
{“log”:“{"level":"info","ts":"2023-07-07T08:49:06.477Z","msg":"Build info.","git-time":"2023-05-15T23:50:55.000Z","git-revision":"45d22540323e59e4cd3fd62139b73409f1264fb3","git-modified":true,"go-arch":"amd64","go-os":"linux","go-version":"go1.20.4","cgo-enabled":false,"server-version":"1.20.3","debug-mode":false,"logging-call-at":"main.go:143"}\n”,“stream”:“stdout”,“time”:“2023-07-07T08:49:06.477756462Z”}
{“log”:“{"level":"warn","ts":"2023-07-07T08:49:06.478Z","msg":"Not using any authorizer and flag--allow-no-auth
not detected. Future versions will require using the flag--allow-no-auth
if you do not want to set an authorizer.","logging-call-at":"main.go:173"}\n”,“stream”:“stdout”,“time”:“2023-07-07T08:49:06.478709634Z”}
{“log”:“{"level":"fatal","ts":"2023-07-07T08:49:06.734Z","msg":"unable to initialize cassandra session","component":"metadata-initializer","error":"no connections were made when creating the session","logging-call-at":"factory.go:66","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Fatal\n\t/home/builder/temporal/common/log/zap_logger.go:174\ngo.temporal.io/server/common/persistence/cassandra.NewFactory\n\t/home/builder/temporal/common/persistence/cassandra/factory.go:66\ngo.temporal.io/server/common/persistence/client.DataStoreFactoryProvider\n\t/home/builder/temporal/common/persistence/client/store.go:82\ngo.temporal.io/server/temporal.ApplyClusterMetadataConfigProvider\n\t/home/builder/temporal/temporal/fx.go:621\nreflect.Value.call\n\t/usr/local/go/src/reflect/value.go:586\nreflect.Value.Call\n\t/usr/local/go/src/reflect/value.go:370\ngo.uber.org/dig.defaultInvoker\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/container.go:220\ngo.uber.org/dig.(*constructorNode).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/constructor.go:154\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:288\ngo.uber.org/dig.paramObjectField.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:485\ngo.uber.org/dig.paramObject.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:412\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:151\ngo.uber.org/dig.(*constructorNode).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/constructor.go:145\ngo.uber.org/dig.paramGroupedSlice.callGroupProviders\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:612\ngo.uber.org/dig.paramGroupedSlice.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:642\ngo.uber.org/dig.paramObjectField.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:485\ngo.uber.org/dig.paramObject.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:412\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:151\ngo.uber.org/dig.(*constructorNode).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/constructor.go:145\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:288\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:151\ngo.uber.org/dig.(*constructorNode).Call\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/constructor.go:145\ngo.uber.org/dig.paramSingle.Build\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:288\ngo.uber.org/dig.paramList.BuildList\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/param.go:151\ngo.uber.org/dig.(*Scope).Invoke\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/invoke.go:85\ngo.uber.org/dig.(*Container).Invoke\n\t/go/pkg/mod/go.uber.org/dig@v1.15.0/invoke.go:46\ngo.uber.org/fx.runInvoke\n\t/go/pkg/mod/go.uber.org/fx@v1.18.2/invoke.go:108\ngo.uber.org/fx.(*module).executeInvoke\n\t/go/pkg/mod/go.uber.org/fx@v1.18.2/module.go:186\ngo.uber.org/fx.(*module).executeInvokes\n\t/go/pkg/mod/go.uber.org/fx@v1.18.2/module.go:172\ngo.uber.org/fx.New\n\t/go/pkg/mod/go.uber.org/fx@v1.18.2/app.go:530\ngo.temporal.io/server/temporal.NewServerFx\n\t/home/builder/temporal/temporal/fx.go:135\ngo.temporal.io/server/temporal.NewServer\n\t/home/builder/temporal/temporal/server.go:69\nmain.buildCLI.func2\n\t/home/builder/temporal/cmd/server/main.go:184\ngithub.com/urfave/cli/v2.(*Command).Run\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.4.0/command.go:163\ngithub.com/urfave/cli/v2.(*App).RunContext\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.4.0/app.go:313\ngithub.com/urfave/cli/v2.(*App).Run\n\t/go/pkg/mod/github.com/urfave/cli/v2@v2.4.0/app.go:224\nmain.main\n\t/home/builder/temporal/cmd/server/main.go:54\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}\n”,“stream”:“stdout”,“time”:“2023-07-07T08:49:06.73526558Z”}
I also made sure to start temporal when cql-proxy was ready with the following command:
“until curl --silent --fail http://localhost:8000/readiness 2>&1 > /dev/null; do echo waiting for cql-proxy to start; sleep 1; done; ./entrypoint.sh”
i also tried port forwarding to the cql-proxy of an affected pod, but i can connect normally with cqlsh and do queries.
I didn’t have this kind of behavior before upgrading to 1.20, i was using 1.17.
At the moment, when I notice the problem I kill the pod until it starts properly.
do you have any idea what could be the problem? can be k8s node related? it seems that the ones that have the problem are on the same node.
es:
Thanks.