WorkflowTaskFailed doesn't update (remains frozen wrt failure reason) making it a bit hard to debug

ustulation · August 7, 2024, 11:26am

Consider this workflow (I’m using python btw but I don’t think that should matter, so just writing pseudo-code):

do-activity-1
raise Exception('foo')
do-activity-2

So obviously, this is going to finish activity-1 and result in WorkflowTaskFailed with reason being that exception foo was raised. The workflow task will keep getting retried.

Now change code to:

do-activity-1
raise Exception('bar')
do-activity-2

and redploy the worker. From the worker logs I can see that it’s now raising the new exception, but the UI doesn’t update the status of event history, it remains frozen at WorkflowTaskFailed with the reason that exception foo occurred which is no longer accurate. This is just an example but it makes troubleshooting a bit difficult by looking things up in the UI. It’s as-if the worker was running stale code and wasn’t updated.

Then even introduce non-determinism by changing the code to the following and redeploy the (only) worker:

do-activity-3
raise Exception('bar')
do-activity-2

Again from the worker logs, I can see it quits execution as soon as it sees divergence (expects activity-1 to be completed by looking at event history but finds activity-3 in its place in new code). So it immediately detects non-determinism and quits the current execution but the UI for the workflow remains frozen with just the original error that workflow-task failed due to exception foo.

Ofc if you now revert back to the 1st snippet and get rid of the exception, the worker now successfully completes the workflow, finishing activity-2 too now, and the UI updates with all that and finally showing the workflow as completed.

However in the meantime, due to lack of updates it makes troubleshooting a bit difficult. Is this an expected behaviour?

maxim · August 7, 2024, 5:21pm

You make a good point. Would you file a GitHub issue about this?

ustulation · August 8, 2024, 2:50am

Cheers, will do and link it here.

ustulation · August 12, 2024, 11:33am

Github issue raised.

Topic		Replies	Views
Not showing Non-Determinism error because worker blindly runs the activity without event-history reconciliation Community Support python-sdk	6	84	August 11, 2024
Exceptions from parallel activities causing Workflow restart infinitely Community Support java-sdk	14	1194	February 9, 2022
WorkflowTaskFailed: Failure handling event 5 of 'EVENT TYPE ACTIVITY TASK SCHEDULED' Community Support	3	2602	March 16, 2021
Workflow retry "ContinueAsNew" Community Support	1	529	March 9, 2021
WorkflowTask failure because of an invalid state transition in LocalActivity state machine Community Support java-sdk	3	168	July 27, 2024

WorkflowTaskFailed doesn't update (remains frozen wrt failure reason) making it a bit hard to debug

Related topics