Parent Workflow Stalls After Child Workflow Failure (TypeScript SDK)

Hi,

We’re running into a tricky issue with our workflows. When a child workflow fails, the parent workflow keeps running, which is what we expect (we do not rethrow). However, the parent workflow’s activities just stop working. It still handles signals and processes code, but it gets stuck when it hits any activity.

Here’s a quick rundown:

  • We’re using the TypeScript SDK on Temporal Cloud.
  • The child workflow fails due to a 3rd party API issue. We catch the error, log it, and move on without re-throwing it.
  • After this, when the parent workflow gets signaled, it hangs when it reaches an activity.
  • Also, we’ve noticed that the signals we sent are being retried every 10 minutes. The weird part is, there’s nothing in the Temporal console showing this retry behavior.

Here’s a code snippet of how we’re handling the child workflow:

try {
  wf.executeChild(startInvoiceStatusCheckV3, {
    workflowId: createInvoiceStatusCheckWorkflowId({
      referenceId: invoiceToApprove.invoice.invoice.paymentReference,
    }),
    args: [{ data: invoiceToApprove }],
  });
} catch (err) {
  state.runningTask = null;
  logger.error(`Error in invoice status check workflow: ${err}`);
  // do nothing, this is expected
}

Note, if we reset the workflow its fine. but we got like 200+ workflows running with this issue (child workflow failed).

What could be the issue?

Thank you!

here is a screenshot of our logs, where we see the signals being retried every 10 minuts.

Hi @EdwinN1337

We’re using the TypeScript SDK on Temporal Cloud.

Please open a ticket and we can look into it, sharing the workflowid and namespace (or the workflow URL)

  • After this, when the parent workflow gets signaled, it hangs when it reaches an activity.

I guess something in the signal handler throws that error, making it impossible for the execution to progress. hence the WorkflowTaskFailed. You can use workflow replay to debug it.

If you want to share the code and/or the workflow history we can look into it.

  • Also, we’ve noticed that the signals we sent are being retried every 10 minutes. The weird part is, there’s nothing in the Temporal console showing this retry behavior.

This is due to the WorkflowTaskFailed. By design, the server will reschedule WorkflowTaskFailed, which allows fixing the code without losing execution state.

Antonio

for the record, the topic is being discussed here Slack

Thanks for the help!

The issue was an unhandled promise rejection. Unhandled rejections are terminate the Node process. We had the wf.executeChild inside a try/catch block without await, which doesn’t catch the error as expected. Instead, we should have used .catch() on the promise itself to handle the rejection properly and prevent the workflow to fail every time it received a signal.

1 Like