WorkflowTaskTimedOut StartToClose

Bram_Purnot · June 22, 2023, 8:45pm

Hi all,

we followed the most basic hello world app and deployed this on a Kubernetes cluster using the standard helm charts:

The app is connected to a PostGresQL DB for both the data store and one for the visibility. Whenever we launch workflows, it gives us a WorkflowTaskTimedOut of type StartToClose. We executed 65 workflows and all of them failed after a while. For some reason, none of the workflows are executed successfully.

We do see errors in the history pod.

“log”: “{"level":"error","ts":"2023-06-22T20:30:25.968Z","msg":"service failures","operation":"RecordWorkflowTaskStarted","wf-namespace":"dev","error":"GetWorkflowExecution: failed to get timer info. Error: Failed to get timer info. Error: context deadline exceeded","logging-call-at":"telemetry.go:295","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).handleError\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:295\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:166\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:113\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1\n\t/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.36.4/interceptor.go:341\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/home/builder/temporal/common/rpc/grpc.go:137\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1155\ngo.temporal.io/server/api/historyservice/v1._HistoryService_RecordWorkflowTaskStarted_Handler\n\t/home/builder/temporal/api/historyservice/v1/service.pb.go:1139\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1345\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1722\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:966"}”,

Another strange thing that I noticed is that we have lots of DB connections. In total we had 304K DB connections in the past 12 hours…I don’t expect this to be normal? We have 2 worker pods running at the same time because we thought it was caused by an overload of the worker but that doesn’t seem to be the case.

Bram

Bram_Purnot · June 22, 2023, 8:54pm

I just noticed this error in the frontend:
{“level”:“info”,“ts”:“2023-06-22T20:50:35.136Z”,“msg”:“history client encountered error”,“service”:“frontend”,“error”:“last connection error: connection error: desc = "error reading server preface: read tcp: use of closed network connection"”,“service-error-type”:“serviceerror.Unavailable”,“logging-call-at”:“metric_client.go:90”}

Topic		Replies	Views
Can not schedule workflow with the same ID Community Support typescript-sdk	11	1663	July 2, 2022
ServiceError: Failed to start Workflow Community Support	5	1110	July 30, 2023
Unable to execute workflow context deadline exceeded` after setting up mtls Community Support tls	8	2003	December 16, 2021
Errors reported on Temporal server with heavy load but workflows succeed Community Support performance	6	1196	August 18, 2022
A workflow containing a timer encounters an error after the worker restarting Community Support go-sdk	3	90	October 24, 2024

WorkflowTaskTimedOut StartToClose

Related topics