Workflow Task Failed - Workflow not failed

Hi,

I’m developing a workflow and I’ve done something wrong and I get the following error:

In my terminal logs, I can see this:

1f3dd74b7a **Attempt 73** CachedPreviousStartedEventID 6 TaskFirstEventID 1 TaskStartedEventID 6 PreviousStartedEventID 0
2022/09/14 23:07:46 DEBUG WfData:  Namespace default TaskQueue TestTQ WorkerID 25159@KJAY-M-MGGN@ WorkflowType ExecuteWorkflow WorkflowID test759d0db1-47ec-47a5-83da-5e4719f5ec3c RunID 611799b5-1fba-41d5-b0d9-e21f3dd74b7a Attempt 1 {"stringParam":"Jay"}
WFData: {"stringParam":"Jay"}
Operation State
Function Ref:  uppercase
2022/09/14 23:07:46 DEBUG ExecuteActivity Namespace default TaskQueue TestTQ WorkerID 25159@KJAY@ WorkflowType ExecuteWorkflow WorkflowID test759d0db1-47ec-47a5-83da-5e4719f5ec3c RunID 611799b5-1fba-41d5-b0d9-e21f3dd74b7a Attempt 1 ActivityID 8 ActivityType Uppercase
2022/09/14 23:07:46 INFO  Task processing failed with error Namespace default TaskQueue TestTQ WorkerID 25159@KJAY@ WorkerType WorkflowWorker Error BadScheduleActivityAttributes: A valid StartToClose or ScheduleToCloseTimeout is not set on command.

Now I understand why I’m getting this error, and it’s easy enough to fix. My question is, in this instance, why is the workflow status not failed? The workflow status is running and I have to terminate this manually.

Thanks,
Jay.

3 Likes

This is by design. We don’t want workflows to fail on unexpected panics. This allows fixing the code without losing workflow instances. You can disable this behavior by setting WorkerOptions.WorkflowPanicPolicy to FailWorkflow.

1 Like

Now I understand why I’m getting this error, and it’s easy enough to fix

Yes, just for reference to other readers, ActivityOptions must have a StartToClose and/or ScheduleToCloseTimeout set.

why is the workflow status not failed

The terminal log you have I believe is a server log. Means that your workflow worker sent the schedule activity invocation command to the server and server noticed that its missing timeout config. For command validation errors your workflow exec gets blocked and your workflow task is retried waiting on a fix.
You should be able to fix your workflow code and restart your workers and the execution would be able to continue on the next workflow task retry.

Thanks @maxim and @tihomir.

Hi @maxim and @tihomir,

I recently came across the same problem with my workflow where my WorkflowTask failed but my workflow did not. I went looking for the solution but i could find the panicPolicy being supported for GO. I am unable to find the same for java.

does Java support the panic policy, if yes am I searching it in the wrong place ?

Regards,
Vishal.

In Java, you can specify which exceptions lead to workflow failures. Use WorkflowImplmentationOptions.setFailWorkflowExceptionTypes.

Hi @maxim, I tried but still the issue seems to persist. Please refer screenshot below.

Also passing Throwable.class which should fail workflow on any exception does not fail the workflow.

It looks like this option is ignored when the failure is reported by the service. I filed a bug report.

is similar option available in python sdk?

IMHO this may cause bad user experience, for example in case a user clicked on a button in the browser and this triggered a workflow,
Now the user pooling this workflow to completion but it never completes,

Another question in case the workflow timeout is 5 minutes seems not real to be able to write a fix, code review and deployment

For such use cases, set the option always to fail workflow on unexpected errors.

Failing workflows on an unexpected bug can be catastrophic for certain scenarios when returning an error is not an option.

Thak you for the answer,
We are struggling to find a way to change this option in typescript SDK, Is there a way?
And if there a cluster-level configuration for that even more suitable for us

2 Likes

Hello,
Did you find a similar option in typescript sdk ?

is similar option available in python sdk?

We are struggling to find a way to change this option in typescript SDK, Is there a way?

At this time, there is no equivalent option in other SDKs.

You may however easily get that same behavior by simply catching errors and rethrowing an ApplicationFailure instead. For example, in TypeScript, you could do this:

export async function myWorkflow() {
  try {
    // ...
  } catch (e) {
    throw ApplicationFailure.create(...)
  }
}

Or if you want to that systematically for all of your workflows, you may do this from a Workflow Interceptor.

A word of caution: please keep in mind that the default error behavior is intentional — make sure you correctly understand the consequences before reverting that behavior. By blindly making your Workflow fail on any error, you are allowing your workflow to terminate without properly cleaning up operations it may have started.

For example, your workflow may have initiated a credit card transaction; if the Workflow Execution fails, you must revert that credit transaction. But a generic Error may result from a bug in your Workflow code, and therefore, you may not have the opportunity to properly clean up things. The fact that Temporal retries the Workflow Task in such case means you get the opportunity to fix the bug, after which the Workflow execution will resume, so that it can either complete normally (if that still makes sense at that point) or clean up properly.