Hello!
We’ve had a short outage of our persistence db. During that time some workflow tasks got stuck. It seems like after we got everything up and running, they’re just not getting inserted into queues again. Why do these never get timed out? I managed to reset some of them but some of them got an error, all of these only have two items in their history. How do I reset workflows like that? How do I prevent this from happening? Is it expected that when DB starts lagging these zombie workflows are created?
Temporal: 1.22.2
DB: Aurora Postgres 15.4 (AWS)
Workflow task finish ID must be > 1 && <= workflow last event ID.
Nor our metrics, nor anything else reports these tasks/activities are pending.
Data:
Workflow stuck:
/etc/temporal$ tctl adm tq desc --taskqueue saga_command_bus --tqt workflow
READ LEVEL | ACK LEVEL | BACKLOG | LEASE START TASKID | LEASE END TASKID
11601434 | 11601434 | 0 | 11600001 | 11700000
WORKFLOW POLLER IDENTITY | LAST ACCESS TIME
saga_command_bus:8c75b334-03a3-4264-80dd-fc3f082400f3 | 2023-12-05T13:48:41Z
saga_command_bus:873c92dd-6248-4fc9-9fbd-5e700dca8388 | 2023-12-05T13:48:41Z
saga_command_bus:2bd056e5-59a8-4c3d-b684-8e28d4de18a4 | 2023-12-05T13:48:41Z
saga_command_bus:0d839679-4ec0-43ad-928c-6463ee3130f6 | 2023-12-05T13:48:41Z
saga_command_bus:efa0a885-6f07-4021-bdb1-fb5d39d485fe | 2023-12-05T13:48:41Z
saga_command_bus:4c20f491-0a34-43cc-a9c4-955a78a32c77 | 2023-12-05T13:48:41Z
saga_command_bus:b9d46eab-49d4-40d0-a3d1-fe09f193b4c9 | 2023-12-05T13:48:41Z
saga_command_bus:ae9dcae6-fed8-4a25-b748-54f24d408f84 | 2023-12-05T13:48:41Z
saga_command_bus:a4e9cb05-b23b-4995-8ede-f7013c6bd8fc | 2023-12-05T13:48:41Z
saga_command_bus:1799ba7e-0995-405e-92e4-99539d7eb6a6 | 2023-12-05T13:48:40Z
saga_command_bus:8a7339b3-d3a1-4804-be6d-4c23a58f3e69 | 2023-12-05T13:48:40Z
saga_command_bus:819bd814-8bd3-45b7-bfb8-f03595460121 | 2023-12-05T13:48:40Z
saga_command_bus:0d568eed-8039-48c0-b230-729753cdd8f5 | 2023-12-05T13:48:40Z
saga_command_bus:99c8b076-276a-4ae7-80e6-bf0d634c0dd2 | 2023-12-05T13:48:40Z
saga_command_bus:fbef68b7-c08a-4695-b67f-ea36bc702f22 | 2023-12-05T13:48:40Z
saga_command_bus:30f6c2a2-0de2-41b9-bea5-f8ca69f7752c | 2023-12-05T13:48:40Z
saga_command_bus:ec10623e-8dfc-48e7-9105-07092fa9774e | 2023-12-05T13:48:40Z
saga_command_bus:8489eaba-d68d-44f5-8957-37fd87945bad | 2023-12-05T13:48:40Z
saga_command_bus:29f5139c-62c8-4f0c-a9cf-85be2e738c62 | 2023-12-05T13:48:39Z
saga_command_bus:ecaa5372-504a-4091-9570-8177be338a20 | 2023-12-05T13:48:39Z
saga_command_bus:ea8ed384-b065-465c-9661-8dc80c5607bd | 2023-12-05T13:48:39Z
saga_command_bus:181c7d03-6a84-498e-aabe-8474387c5cf0 | 2023-12-05T13:48:39Z
saga_command_bus:93034fe6-3694-400f-8a61-997c5582f213 | 2023-12-05T13:48:38Z