Hey, we have deployed the temporal cluster to kubernetes using Temporal Helm Charts + Cassandra with default namespace params.
The workflows execution is generally working - it’s completing successfully, but when I am trying to schedule one more workflow with the same workflow ID - the typescript SDK just hanging without any errors (I am even tried to pass separately workflowIdReusePolicy to allow duplicated the same result).
Then I tried to schedule the same workflow with the same id using tctl
with env TEMPORAL_CLI_SHOW_STACKS=1- and getting an error
Error: Failed to create workflow.
Error Details: context deadline exceeded
Stack trace:
goroutine 1 [running]:
runtime/debug.Stack()
runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
runtime/debug/stack.go:16 +0x19
github.com/temporalio/tctl/cli_curr.printError({0x2b6d288, 0x1a}, {0x2faa240, 0xc00028c648})
github.com/temporalio/tctl/cli_curr/util.go:392 +0x21e
github.com/temporalio/tctl/cli_curr.ErrorAndExit({0x2b6d288?, 0xa?}, {0x2faa240?, 0xc00028c648?})
github.com/temporalio/tctl/cli_curr/util.go:403 +0x28
github.com/temporalio/tctl/cli_curr.startWorkflowHelper.func1()
github.com/temporalio/tctl/cli_curr/workflowCommands.go:225 +0x1a5
github.com/temporalio/tctl/cli_curr.startWorkflowHelper(0xc00003d760, 0x0)
github.com/temporalio/tctl/cli_curr/workflowCommands.go:262 +0x557
github.com/temporalio/tctl/cli_curr.StartWorkflow(...)
github.com/temporalio/tctl/cli_curr/workflowCommands.go:173
github.com/temporalio/tctl/cli_curr.newWorkflowCommands.func3(0xc00003d760?)
github.com/temporalio/tctl/cli_curr/workflow.go:57 +0x1b
github.com/urfave/cli.HandleAction({0x273c0a0?, 0x2c17e30?}, 0x5?)
github.com/urfave/cli@v1.22.5/app.go:526 +0x50
github.com/urfave/cli.Command.Run({{0x2b3dc2d, 0x5}, {0x0, 0x0}, {0x0, 0x0, 0x0}, {0x2b79fe4, 0x1e}, {0x0, ...}, ...}, ...)
github.com/urfave/cli@v1.22.5/command.go:173 +0x652
github.com/urfave/cli.(*App).RunAsSubcommand(0xc0003236c0, 0xc00003d4a0)
github.com/urfave/cli@v1.22.5/app.go:405 +0x91b
github.com/urfave/cli.Command.startApp({{0x2b45303, 0x8}, {0x0, 0x0}, {0xc00007fe40, 0x1, 0x1}, {0x2b6abf9, 0x19}, {0x0, ...}, ...}, ...)
github.com/urfave/cli@v1.22.5/command.go:372 +0x6e7
github.com/urfave/cli.Command.Run({{0x2b45303, 0x8}, {0x0, 0x0}, {0xc00007fe40, 0x1, 0x1}, {0x2b6abf9, 0x19}, {0x0, ...}, ...}, ...)
github.com/urfave/cli@v1.22.5/command.go:102 +0x808
github.com/urfave/cli.(*App).Run(0xc000323340, {0xc000134000, 0x16, 0x16})
github.com/urfave/cli@v1.22.5/app.go:277 +0x8a7
main.main()
./main.go:45 +0xa6
And I am only getting this error if the same workflow ID is used for already completed workflow. If I am using the new one - it’s processing successfully.
There are also some errors in history service logs
{"level":"error","ts":"2022-06-08T15:02:38.737Z","msg":"Persistent store operation Failure","shard-id":216,"address":"172.31.1.248:7234","wf-namespace-id":"32049b68-7872-4094-8e63-d0dd59896a83","wf-id":"temporal-sys-history-scanner","wf-run-id":"bc3ab2dc-e36e-441f-9516-5e735e637b63","store-operation":"create-wf-execution","error":"Encounter shard ownership lost, request range ID: 2801241, actual range ID: 0","logging-call-at":"transaction_impl.go:380","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history/workflow.createWorkflowExecutionWithRetry\n\t/home/builder/temporal/service/history/workflow/transaction_impl.go:380\ngo.temporal.io/server/service/history/workflow.(*ContextImpl).CreateWorkflowExecution\n\t/home/builder/temporal/service/history/workflow/context.go:361\ngo.temporal.io/server/service/history.(*historyEngineImpl).StartWorkflowExecution\n\t/home/builder/temporal/service/history/historyEngine.go:561\ngo.temporal.io/server/service/history.(*Handler).StartWorkflowExecution\n\t/home/builder/temporal/service/history/handler.go:527\ngo.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1\n\t/home/builder/temporal/api/historyservice/v1/service.pb.go:948\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/rate_limit.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:108\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1119\ngo.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:113\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1119\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1119\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/home/builder/temporal/common/rpc/grpc.go:131\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1119\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1121\ngo.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler\n\t/home/builder/temporal/api/historyservice/v1/service.pb.go:950\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1282\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1619\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:921"}
{"level":"error","ts":"2022-06-08T15:02:38.740Z","msg":"Error complete transfer task","shard-id":216,"address":"172.31.1.248:7234","component":"transfer-queue-processor","error":"shard closed","logging-call-at":"transferQueueProcessor.go:259","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history.(*transferQueueProcessorImpl).completeTransferLoop\n\t/home/builder/temporal/service/history/transferQueueProcessor.go:259"}
{"level":"error","ts":"2022-06-08T15:02:38.764Z","msg":"Persistent store operation Failure","shard-id":216,"address":"172.31.1.248:7234","wf-namespace-id":"32049b68-7872-4094-8e63-d0dd59896a83","wf-id":"temporal-sys-history-scanner","wf-run-id":"b4421186-dc25-4cf6-9947-2012d904b7a2","store-operation":"create-wf-execution","error":"Encounter shard ownership lost, request range ID: 2801242, actual range ID: 0","logging-call-at":"transaction_impl.go:380","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history/workflow.createWorkflowExecutionWithRetry\n\t/home/builder/temporal/service/history/workflow/transaction_impl.go:380\ngo.temporal.io/server/service/history/workflow.(*ContextImpl).CreateWorkflowExecution\n\t/home/builder/temporal/service/history/workflow/context.go:361\ngo.temporal.io/server/service/history.(*historyEngineImpl).StartWorkflowExecution\n\t/home/builder/temporal/service/history/historyEngine.go:561\ngo.temporal.io/server/service/history.(*Handler).StartWorkflowExecution\n\t/home/builder/temporal/service/history/handler.go:527\ngo.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler.func1\n\t/home/builder/temporal/api/historyservice/v1/service.pb.go:948\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/rate_limit.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/home/builder/temporal/common/rpc/interceptor/telemetry.go:108\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1119\ngo.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:113\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1119\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/home/builder/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1119\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/home/builder/temporal/common/rpc/grpc.go:131\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1119\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1121\ngo.temporal.io/server/api/historyservice/v1._HistoryService_StartWorkflowExecution_Handler\n\t/home/builder/temporal/api/historyservice/v1/service.pb.go:950\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1282\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1619\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:921"}
{"level":"error","ts":"2022-06-08T15:02:38.768Z","msg":"Error complete transfer task","shard-id":216,"address":"172.31.1.248:7234","component":"transfer-queue-processor","error":"shard closed","logging-call-at":"transferQueueProcessor.go:259","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history.(*transferQueueProcessorImpl).completeTransferLoop\n\t/home/builder/temporal/service/history/transferQueueProcessor.go:259"}
and matching service
{"level":"error","ts":"2022-06-08T11:48:13.503Z","msg":"Membership upsert failed.","service":"matching","error":"operation UpsertClusterMembership encountered Operation timed out for temporal.cluster_membership - received only 0 responses from 0 CL=LOCAL_QUORUM.","logging-call-at":"rpMonitor.go:284","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/common/membership.(*ringpopMonitor).startHeartbeatUpsertLoop.func1\n\t/home/builder/temporal/common/membership/rpMonitor.go:284"}
{"level":"info","ts":"2022-06-08T12:19:23.848Z","msg":"matching client encountered error","service":"matching","error":"Workflow did not register a handler for __stack_trace","service-error-type":"serviceerror.QueryFailed","logging-call-at":"metricClient.go:268"}
{"level":"info","ts":"2022-06-08T12:19:28.074Z","msg":"matching client encountered error","service":"matching","error":"Workflow did not register a handler for __stack_trace","service-error-type":"serviceerror.QueryFailed","logging-call-at":"metricClient.go:268"}
{"level":"info","ts":"2022-06-08T12:28:51.180Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.205Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.232Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.260Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.279Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.295Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.295Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.350Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.397Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.402Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.420Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.451Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.453Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.476Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.478Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.487Z","msg":"history client encountered error","service":"matching","error":"shard closed","service-error-type":"serviceerror.Unavailable","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.510Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.512Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.540Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
{"level":"info","ts":"2022-06-08T12:28:51.594Z","msg":"history client encountered error","service":"matching","error":"Activity task already started.","service-error-type":"serviceerror.TaskAlreadyStarted","logging-call-at":"metricClient.go:638"}
Can you please give some insights what can be wrong in cluster setup that it doesn’t allow to schedule workflow with the same ID?