Hi all,
we followed the most basic hello world app and deployed this on a Kubernetes cluster using the standard helm charts:
The app is connected to a PostGresQL DB for both the data store and one for the visibility. Whenever we launch workflows, it gives us a WorkflowTaskTimedOut of type StartToClose. We executed 65 workflows and all of them failed after a while. For some reason, none of the workflows are executed successfully.
We do see errors in the history pod.
“log”: “{"level":"error","ts":"2023-06-22T20:30:25.968Z","msg":"service failures","operation":"RecordWorkflowTaskStarted","wf-namespace":"dev","error":"GetWorkflowExecution: failed to get timer info. Error: Failed to get timer info. Error: context deadline exceeded","logging-call-at":"telemetry.go:295","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).handleError\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:295\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:166\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:113\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n\t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.36.4/interceptor.go:341\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/home/builder/temporal/common/rpc/grpc.go:137\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1155\ngo.temporal.io/server/api/historyservice/v1._HistoryService_RecordWorkflowTaskStarted_Handler\n\t/home/builder/temporal/api/historyservice/v1/service.pb.go:1139\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1345\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1722\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:966"}”,
Another strange thing that I noticed is that we have lots of DB connections. In total we had 304K DB connections in the past 12 hours…I don’t expect this to be normal? We have 2 worker pods running at the same time because we thought it was caused by an overload of the worker but that doesn’t seem to be the case.
Bram