Hi! I’m wondering if there is an out of the box metric I can leverage to get workflows that have failed but have not succeeded on retry. By retry I mean resetting the workflow either through the UI’s reset button or through the CLI to reset/resubmit. The use case of this is to have observability on workflows that failed and did not succeed on retrying the workflow which causes a business impact as we need all the workflows to be resolved/successful. (And I know the best practice is to have the workflows infinitely retry but that requires much more work on our side to support that)
We do have an example of getting these unresolved workflows (failed but never succeeded) in one of our repos but that involves some custom code to correlate the failed workflows with the successful ones using the temporal sdk. Which is totally fine but would like to get guidance on if there is an easier out of the box metric. Thanks for any help!