Failed to lock task queue. Error: sql: no rows in result set

Hello,

This morning the list ow workers for our task-queue was empty, although we have 8 instances running. We were receiving new workflow executions, but they were not being processed. Looking into server logs, we have thousands of the errors bellow

{“level”:“error”,“ts”:“2020-12-17T17:56:30.235Z”,“msg”:“Operation failed with internal error.”,“service”:“matching”,“error”:“Failed to lock task queue. Error: sql: no rows in result set”,“metric-scope”:28,“logging-call-at”:“persistenceMetricClients.go:747”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/common/persistence.(*taskPersistenceClient).updateErrorMetric\n\t/temporal/common/persistence/persistenceMetricClients.go:747\ngo.temporal.io/server/common/persistence.(*taskPersistenceClient).CreateTasks\n\t/temporal/common/persistence/persistenceMetricClients.go:641\ngo.temporal.io/server/service/matching.(*taskQueueDB).CreateTasks\n\t/temporal/service/matching/db.go:129\ngo.temporal.io/server/service/matching.(*taskWriter).taskWriterLoop\n\t/temporal/service/matching/taskWriter.go:180”}

{“level”:“error”,“ts”:“2020-12-17T17:56:30.235Z”,“msg”:“Persistent store operation failure”,“service”:“matching”,“component”:“matching-engine”,“wf-task-queue-name”:“5517c372fff8:079843aa-5272-4ba7-89aa-3f1b7fc43447”,“wf-task-queue-type”:“Workflow”,“store-operation”:“create-task”,“error”:“Failed to lock task queue. Error: sql: no rows in result set”,“wf-task-queue-name”:“5517c372fff8:079843aa-5272-4ba7-89aa-3f1b7fc43447”,“wf-task-queue-type”:“Workflow”,“number”:21187,“next-number”:21187,“logging-call-at”:“taskWriter.go:182”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/matching.(*taskWriter).taskWriterLoop\n\t/temporal/service/matching/taskWriter.go:182”}

{“level”:“error”,“ts”:“2020-12-17T17:56:33.158Z”,“msg”:“Fail to process task”,“service”:“history”,“shard-id”:3,“address”:“172.19.0.2:7234”,“shard-item”:“0xc000d35880”,“component”:“transfer-queue-processor”,“cluster-name”:“active”,“shard-id”:3,“queue-task-id”:59760915,“queue-task-visibility-timestamp”:1608227787068600342,“xdc-failover-version”:0,“queue-task-type”:“TransferWorkflowTask”,“wf-namespace-id”:“81f640cc-c68d-415e-95c1-365b43d8374c”,“wf-id”:“niko-order-1100801638”,“wf-run-id”:“612fc6cc-ceaa-401b-bdee-afb9d1843b6e”,“error”:“context deadline exceeded”,“lifecycle”:“ProcessingFailed”,“logging-call-at”:“taskProcessor.go:326”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:326\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:212\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:238\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:161”}

We restarted workers and the workflows were handled correctly.

The errors above seems to be consistent and continues i’m writing this. Anything we could do?

Thanks for all the help!

Seems that matching service is having issue (first block of logs), causing history service seeing timeouts (second block of logs).

Which version are you using?

seems that the certain errors are not currently handled:

thanks for letting us know the issue.

Hello,

We are using version 1.3.1 at the moment.