How to avoid worker failures


In our current node application, running temporal workflows is part of single monolith application. That means we do not have a different (micro) service which accepts the requests and run the existing defined workflows.

Now the problem is, whenever our application fails due to some exceptions and recovers back being a nodejs application. The workers we have started earlier are not starting. The result is workflows keep piling up and never completes. We need to restart the whole application pod in kubernetes, this as starts the application from scratch, starts the workers and then the workflows start completing.

I am really curious and want to know whether we are using correct architectural practices here. Any guidance using sample application will be much appreciated.

Is your application responding to OS signals?

How are you handling exceptions and starting the workers?

I would recommend deploying workers separately from the rest of the application but if you must deploy the app as a single unit, I’d need more info to support you.