Regular Context Deadline Exceeded error when executing a workflow

Hello!

I’ve been encountering the context deadline exceeded error regularly while using the ExecuteWorkflow method. I call it with a custom context.WithTimeout where the timeout is set to 1 minute. However, the error occurs 10 seconds after making the call in all cases.

Additionally, the temporal-history logs the context deadline exceeded error along with the following stack trace:

  go.temporal.io/server/common/log.(*zapLogger).Error
  /home/builder/temporal/common/log/zap_logger.go:144
go.temporal.io/server/service/history/workflow.createWorkflowExecution
  /home/builder/temporal/service/history/workflow/transaction_impl.go:346
go.temporal.io/server/service/history/workflow.(*ContextImpl).CreateWorkflowExecution
  /home/builder/temporal/service/history/workflow/context.go:349
go.temporal.io/server/service/history/api/startworkflow.Invoke
  /home/builder/temporal/service/history/api/startworkflow/api.go:94
go.temporal.io/server/service/history.(*historyEngineImpl).StartWorkflowExecution
  /home/builder/temporal/service/history/historyEngine.go:424
go.temporal.io/server/service/history.(*Handler).StartWorkflowExecution
  /home/builder/temporal/service/history/handler.go:529
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1
  /home/builder/temporal/api/historyservice/v1/service.pb.go:1046
go.temporal.io/server/common/rpc/interceptor.(*RetryableInterceptor).Intercept.func1
  /home/builder/temporal/common/rpc/interceptor/retry.go:63
go.temporal.io/server/common/backoff.ThrottleRetryContext
  /home/builder/temporal/common/backoff/retry.go:194
go.temporal.io/server/common/rpc/interceptor.(*RetryableInterceptor).Intercept
  /home/builder/temporal/common/rpc/interceptor/retry.go:67
google.golang.org/grpc.chainUnaryInterceptors.func1.1
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1162
go.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept
  /home/builder/temporal/common/rpc/interceptor/rate_limit.go:86
google.golang.org/grpc.chainUnaryInterceptors.func1.1
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1165
go.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept
  /home/builder/temporal/common/rpc/interceptor/telemetry.go:142
google.golang.org/grpc.chainUnaryInterceptors.func1.1
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1165
go.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1
  /home/builder/temporal/common/metrics/grpc.go:113
google.golang.org/grpc.chainUnaryInterceptors.func1.1
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1165
go.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1
  /home/builder/temporal/common/metrics/grpc.go:66
google.golang.org/grpc.chainUnaryInterceptors.func1.1
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1165
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
  /go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc@v0.36.1/interceptor.go:352
google.golang.org/grpc.chainUnaryInterceptors.func1.1
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1165
go.temporal.io/server/common/rpc.ServiceErrorInterceptor
  /home/builder/temporal/common/rpc/grpc.go:137
google.golang.org/grpc.chainUnaryInterceptors.func1.1
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1165
google.golang.org/grpc.chainUnaryInterceptors.func1
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1167
go.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler
  /home/builder/temporal/api/historyservice/v1/service.pb.go:1048
google.golang.org/grpc.(*Server).processUnaryRPC
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1340
google.golang.org/grpc.(*Server).handleStream
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:1713
google.golang.org/grpc.(*Server).serveStreams.func1.2
  /go/pkg/mod/google.golang.org/grpc@v1.50.1/server.go:965

And the temporal-worker’s stack trace is the following:

go.temporal.io/server/common/log.(*zapLogger).Error
	/home/builder/temporal/common/log/zap_logger.go:144
go.temporal.io/server/service/worker/scanner/history.(*Scavenger).handleTask
	/home/builder/temporal/service/worker/scanner/history/scavenger.go:284
go.temporal.io/server/service/worker/scanner/history.(*Scavenger).taskWorker
	/home/builder/temporal/service/worker/scanner/history/scavenger.go:214

No errors were found in the temporal-frontend part during the time when the error occurred.

The execution of a workflow roughly happens this way:

opts := client.StartWorkflowOptions{
	ID:                  uuid.NewString(),
	WorkflowTaskTimeout: time.Second * 30,
	TaskQueue:           "tx_retry_task_queue",
}

ctxWithTimeout, cancel := context.WithTimeout(ctx, time.Minute)
defer cancel()

_, err := rs.temporal.ExecuteWorkflow(ctxWithTimeout, opts, temporal.RetryTransactionWorkflowV2Name, tx)
if err != nil {
	// error handling
}

I’m running all of this within a Kubernetes environment. Could this issue be related to networking problems? May it be possible that the 10-second timeout is derived from the gRPC maximum timeout setting? Would it be possible to workaround the issue by using a RetryPolicy with some retries when executing a workflow?

Temporal Server Version: v1.19.0
Go Temporal SDK Version: v1.21.1

Will a workflow execution be retried if a context deadline exceeded error occurs, provided that I specify a RetryPolicy? Or do I need a custom retry logic for that?

UPD: custom retrying didn’t help

I am experiencing a very similar issue to this, would love to hear thoughts on how I could debug this further.

hi did any one of you find a solution to this. I have had the same issue with java sdk while running in a k8s environment. Found many posts related to load balancer idle time and keep alive time in sdk but still could not get rid of this error.