The best way to surface workflow execution error

Hi,
What is the best way to surface workflow execution error (mostly the activity error). We have a workflow service to manage the workflow definition which provides CRUD and start/terminate workflow APIs etc… Since the workflow runs asynchronously, the workflow service can not know if a workflow is failed or successful immediately.

Using event could be a solution, but we are using rabbitmq and I dont think temporal support that.
To have a cron job to scan all workflows could also work, but it is too heavy.

Since the workflow runs asynchronously, the workflow service can not know if a workflow is failed or successful immediately.

Is this service written in Go? Calling .Get(...) on client.ExecuteWorkflow is blocking but you could for example use goroutines (standard Go async features) to still be able to receive client errors in this manager service.

Another option to consider is metrics, do you have server and SDK metrics enabled?
SDK metrics for example produce activity_execution_failed which should include the activity type. You could build our alerts based on metrics.

Another thing to maybe consider is enhanced visibility which along side the SDK client apis such as ListWorkflowExecution, DescribeWorkflowExecution, etc could help with monitoring as well as you mentioned. Just note that visibility is “eventually consistent” meaning there is a small delay between when the workflow state is updated and when it is reflected in visibility records. Server metrics can help you measure this latency:

histogram_quantile(0.95, sum(rate(task_latency_bucket{operation=~"VisibilityTask.*", service_name="history"}[1m])) by (operation, le))

Thanks Tihomir,
the first option should work, but that will result in a large amount of hanging go routines which wait for the results.
the second option will require a cron job like thing to periodically check the workflow status, not ideal either.

does temporal support pub/sub like mechanism? utilizing a msg queue for async coordination

If you have control over workflow code, you could add an interceptor that calls an activity that notifies your service about completion. If you don’t have such control, then there are not yet better options. In the future, we might support server side interceptors to generate events based on workflow lifecycle.