Hello,
This morning the list ow workers for our task-queue was empty, although we have 8 instances running. We were receiving new workflow executions, but they were not being processed. Looking into server logs, we have thousands of the errors bellow
{“level”:“error”,“ts”:“2020-12-17T17:56:30.235Z”,“msg”:“Operation failed with internal error.”,“service”:“matching”,“error”:“Failed to lock task queue. Error: sql: no rows in result set”,“metric-scope”:28,“logging-call-at”:“persistenceMetricClients.go:747”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/common/persistence.(*taskPersistenceClient).updateErrorMetric\n\t/temporal/common/persistence/persistenceMetricClients.go:747\ngo.temporal.io/server/common/persistence.(*taskPersistenceClient).CreateTasks\n\t/temporal/common/persistence/persistenceMetricClients.go:641\ngo.temporal.io/server/service/matching.(*taskQueueDB).CreateTasks\n\t/temporal/service/matching/db.go:129\ngo.temporal.io/server/service/matching.(*taskWriter).taskWriterLoop\n\t/temporal/service/matching/taskWriter.go:180”}
{“level”:“error”,“ts”:“2020-12-17T17:56:30.235Z”,“msg”:“Persistent store operation failure”,“service”:“matching”,“component”:“matching-engine”,“wf-task-queue-name”:“5517c372fff8:079843aa-5272-4ba7-89aa-3f1b7fc43447”,“wf-task-queue-type”:“Workflow”,“store-operation”:“create-task”,“error”:“Failed to lock task queue. Error: sql: no rows in result set”,“wf-task-queue-name”:“5517c372fff8:079843aa-5272-4ba7-89aa-3f1b7fc43447”,“wf-task-queue-type”:“Workflow”,“number”:21187,“next-number”:21187,“logging-call-at”:“taskWriter.go:182”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/matching.(*taskWriter).taskWriterLoop\n\t/temporal/service/matching/taskWriter.go:182”}
{“level”:“error”,“ts”:“2020-12-17T17:56:33.158Z”,“msg”:“Fail to process task”,“service”:“history”,“shard-id”:3,“address”:“172.19.0.2:7234”,“shard-item”:“0xc000d35880”,“component”:“transfer-queue-processor”,“cluster-name”:“active”,“shard-id”:3,“queue-task-id”:59760915,“queue-task-visibility-timestamp”:1608227787068600342,“xdc-failover-version”:0,“queue-task-type”:“TransferWorkflowTask”,“wf-namespace-id”:“81f640cc-c68d-415e-95c1-365b43d8374c”,“wf-id”:“niko-order-1100801638”,“wf-run-id”:“612fc6cc-ceaa-401b-bdee-afb9d1843b6e”,“error”:“context deadline exceeded”,“lifecycle”:“ProcessingFailed”,“logging-call-at”:“taskProcessor.go:326”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:326\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:212\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:238\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:161”}
We restarted workers and the workflows were handled correctly.
The errors above seems to be consistent and continues i’m writing this. Anything we could do?
Thanks for all the help!