I’ve been using Temporal to run things I’d previously use queues for - it’s saved me some time by defining the flow better, writing retry and failure logic, as well as some with passing data around.
However, the ux for debugging failing workflows is slow (even with nvme, and a lot of ram, on a single box prototype), and the whole thing seemingly gets slower the more workflows I have - even if completed.
Should I be “tidying” up the completed workflows somehow? I (currently) don’t care about them after some period of time.
Am I doing it wrong?
Are you using an in-memory DB? Completed workflows shouldn’t slow a typical DB down.
I’m using Postgres to back it; on an nvme drive, and I have a decent amount of ram, and cpu.
The UI only shows ~1000 workflows, even though I have a lot more running. I have repeatedly hit either slowness, or this “cache capacity is fully occupied with pinned elements” error…it kinda goes away if I stop using it…but if I keep adding things in, and or scale up the workers…
It seems to be 500’ing on these endpoints:
There are many reasons for the system not performing well. Starting from DB and underlying storage to non optimal configuration. For example setting a wrong number of shards can affect performance significantly.
Have you considered using Temporal Cloud? All these issues will be solved for you out of the box.
@maxim I have not - I’d rather get some help / ideas / pointers for using temporal.
Underlying storage is nvme, and dedicated to this project only - i.e. I don’t believe it’s that looking at iostat etc.
Optimal configuration, I’ve no idea - but, I will read, or change anything you’d suggest?
Can I somehow increase the cache size? What is the work around or fix for the issue I’m getting?
FYI, checking - it looks like I have a single shard, if that’s show by looking in the shards table. I’m going to figure out how to try more.
Any idea why 1 is the default?
8 shards are still on the low end. We usually run a few hundred per host.
@maxim do you think you could answer some of the questions I asked?
I’m not an expert on the ideal configuration of the service for a specific environment and use case.
thanks for the pointers so far - hopefully someone else will have thoughts on this too then!