Sometimes where there’s a lot of RPS like 10 000 workflow starts to repeat already completed steps, which in turn lead to activity failures, which start to retry until retry attempts exceed and workflow fails.
Most importantly this doesn’t happen locally, only on our load testing stand when we are testing our server with 10 000 RPS. Can you give any hints what to look for? Can’t share the code as I’m not allowed to do so:(