Workflow Task Failed - Workflow not failed

Hi,

I’m developing a workflow and I’ve done something wrong and I get the following error:

In my terminal logs, I can see this:

1f3dd74b7a **Attempt 73** CachedPreviousStartedEventID 6 TaskFirstEventID 1 TaskStartedEventID 6 PreviousStartedEventID 0
2022/09/14 23:07:46 DEBUG WfData:  Namespace default TaskQueue TestTQ WorkerID 25159@KJAY-M-MGGN@ WorkflowType ExecuteWorkflow WorkflowID test759d0db1-47ec-47a5-83da-5e4719f5ec3c RunID 611799b5-1fba-41d5-b0d9-e21f3dd74b7a Attempt 1 {"stringParam":"Jay"}
WFData: {"stringParam":"Jay"}
Operation State
Function Ref:  uppercase
2022/09/14 23:07:46 DEBUG ExecuteActivity Namespace default TaskQueue TestTQ WorkerID 25159@KJAY@ WorkflowType ExecuteWorkflow WorkflowID test759d0db1-47ec-47a5-83da-5e4719f5ec3c RunID 611799b5-1fba-41d5-b0d9-e21f3dd74b7a Attempt 1 ActivityID 8 ActivityType Uppercase
2022/09/14 23:07:46 INFO  Task processing failed with error Namespace default TaskQueue TestTQ WorkerID 25159@KJAY@ WorkerType WorkflowWorker Error BadScheduleActivityAttributes: A valid StartToClose or ScheduleToCloseTimeout is not set on command.

Now I understand why I’m getting this error, and it’s easy enough to fix. My question is, in this instance, why is the workflow status not failed? The workflow status is running and I have to terminate this manually.

Thanks,
Jay.

2 Likes

This is by design. We don’t want workflows to fail on unexpected panics. This allows fixing the code without losing workflow instances. You can disable this behavior by setting WorkerOptions.WorkflowPanicPolicy to FailWorkflow.

Now I understand why I’m getting this error, and it’s easy enough to fix

Yes, just for reference to other readers, ActivityOptions must have a StartToClose and/or ScheduleToCloseTimeout set.

why is the workflow status not failed

The terminal log you have I believe is a server log. Means that your workflow worker sent the schedule activity invocation command to the server and server noticed that its missing timeout config. For command validation errors your workflow exec gets blocked and your workflow task is retried waiting on a fix.
You should be able to fix your workflow code and restart your workers and the execution would be able to continue on the next workflow task retry.

Thanks @maxim and @tihomir.

Hi @maxim and @tihomir,

I recently came across the same problem with my workflow where my WorkflowTask failed but my workflow did not. I went looking for the solution but i could find the panicPolicy being supported for GO. I am unable to find the same for java.

does Java support the panic policy, if yes am I searching it in the wrong place ?

Regards,
Vishal.

In Java, you can specify which exceptions lead to workflow failures. Use WorkflowImplmentationOptions.setFailWorkflowExceptionTypes.

Hi @maxim, I tried but still the issue seems to persist. Please refer screenshot below.

Also passing Throwable.class which should fail workflow on any exception does not fail the workflow.

It looks like this option is ignored when the failure is reported by the service. I filed a bug report.

is similar option available in python sdk?

IMHO this may cause bad user experience, for example in case a user clicked on a button in the browser and this triggered a workflow,
Now the user pooling this workflow to completion but it never completes,

Another question in case the workflow timeout is 5 minutes seems not real to be able to write a fix, code review and deployment

For such use cases, set the option always to fail workflow on unexpected errors.

Failing workflows on an unexpected bug can be catastrophic for certain scenarios when returning an error is not an option.

Thak you for the answer,
We are struggling to find a way to change this option in typescript SDK, Is there a way?
And if there a cluster-level configuration for that even more suitable for us