I’ve been using Temporal to run things I’d previously use queues for - it’s saved me some time by defining the flow better, writing retry and failure logic, as well as some with passing data around.
However, the ux for debugging failing workflows is slow (even with nvme, and a lot of ram, on a single box prototype), and the whole thing seemingly gets slower the more workflows I have - even if completed.
Should I be “tidying” up the completed workflows somehow? I (currently) don’t care about them after some period of time.
I’m using Postgres to back it; on an nvme drive, and I have a decent amount of ram, and cpu.
The UI only shows ~1000 workflows, even though I have a lot more running. I have repeatedly hit either slowness, or this “cache capacity is fully occupied with pinned elements” error…it kinda goes away if I stop using it…but if I keep adding things in, and or scale up the workers…
There are many reasons for the system not performing well. Starting from DB and underlying storage to non optimal configuration. For example setting a wrong number of shards can affect performance significantly.
Have you considered using Temporal Cloud? All these issues will be solved for you out of the box.