Mismatch between workflow run_id reported on the UI and current_executions table

Steven_Tang · February 16, 2024, 9:46pm

Hello all,

We ran into an instance where there was a mismatch between the workflow run_id reported on the UI and current_executions, which caused the UI to fail to execute any commands we issue.

Summary of the problem

The image above depicts a workflow execution with a run_id of 4fb406f8-0594-465c-b88c-ba721d7d6335, but when we click reset, the command is issued against the run_id of fe5a8691-08a4-41b5-b669-2fa28661aeb4 and results in failure.

The run_id 4fb406f8-0594-465c-b88c-ba721d7d6335 matches that of the executions table, whereas the run_id fe5a8691-08a4-41b5-b669-2fa28661aeb4 matches that of the current_executions table.

I’ve also attempted to execute terminate via tctl. The results were the same:

Error: Terminate workflow failed.
Error Details: Workflow executionsRow not found.  RunId: fe5a8691-08a4-41b5-b669-2fa28661aeb4

Because the run_id is not tracked in current_executions, it’s effectively orphaned and not making progress. What causes this to happen?

Expected behavior

Workflow will continue to make progress
Commands to cancel, terminate, or reset workflow execution succeeds

Thank you!

Steven_Tang · February 18, 2024, 7:20pm

Adding a bit more context: We use sharded mysql (vitess) as our backing persistence layer.
Is there a possibility that a brief outage with one of the instances can cause inconsistencies with workflow data? Are writes to create or update workflow executions transactional?
I’m curious what else could cause this inconsistency.

Topic		Replies	Views
Workflow Task Failed - Workflow not failed Community Support go-sdk , workflow-implementat	12	4133	June 5, 2024
Retrying a failed workflow Community Support retries	1	594	August 19, 2021
Workflow ID Collision Errors Community Support go-sdk , general-impl	5	715	January 13, 2023
Why use execution and current_execution? What's table 'current_execution' used for? Community Support general-impl	6	656	November 18, 2021
Terminated vs Restarted Community Support	1	489	August 11, 2022

Mismatch between workflow run_id reported on the UI and current_executions table

Related topics