Am I using Temporal for the wrong things?

russ · August 17, 2023, 7:53pm

I’ve been using Temporal to run things I’d previously use queues for - it’s saved me some time by defining the flow better, writing retry and failure logic, as well as some with passing data around.

However, the ux for debugging failing workflows is slow (even with nvme, and a lot of ram, on a single box prototype), and the whole thing seemingly gets slower the more workflows I have - even if completed.

Should I be “tidying” up the completed workflows somehow? I (currently) don’t care about them after some period of time.

Am I doing it wrong?

maxim · August 17, 2023, 9:20pm

Are you using an in-memory DB? Completed workflows shouldn’t slow a typical DB down.

russ · August 18, 2023, 1:06am

I’m using Postgres to back it; on an nvme drive, and I have a decent amount of ram, and cpu.

The UI only shows ~1000 workflows, even though I have a lot more running. I have repeatedly hit either slowness, or this “cache capacity is fully occupied with pinned elements” error…it kinda goes away if I stop using it…but if I keep adding things in, and or scale up the workers…

russ · August 18, 2023, 1:07am

It seems to be 500’ing on these endpoints:

maxim · August 18, 2023, 4:38pm

There are many reasons for the system not performing well. Starting from DB and underlying storage to non optimal configuration. For example setting a wrong number of shards can affect performance significantly.

Have you considered using Temporal Cloud? All these issues will be solved for you out of the box.

russ · August 18, 2023, 6:54pm

@maxim I have not - I’d rather get some help / ideas / pointers for using temporal.

Underlying storage is nvme, and dedicated to this project only - i.e. I don’t believe it’s that looking at iostat etc.

Optimal configuration, I’ve no idea - but, I will read, or change anything you’d suggest?

Can I somehow increase the cache size? What is the work around or fix for the issue I’m getting?

russ · August 18, 2023, 7:16pm

FYI, checking - it looks like I have a single shard, if that’s show by looking in the shards table. I’m going to figure out how to try more.

Any idea why 1 is the default?

I read:

Choosing the Number of Shards in Temporal History Service | Mikhail Shilkov - seems like more shards = better, up to a point. I think I’ll try 8 shards to start, unless you have any better advice. I still have to figure out how to setup more.
Scaling Temporal: The Basics - more shards here…4-512.

maxim · August 19, 2023, 4:18am

8 shards are still on the low end. We usually run a few hundred per host.

russ · August 20, 2023, 4:40am

@maxim do you think you could answer some of the questions I asked?

maxim · August 20, 2023, 4:07pm

I’m not an expert on the ideal configuration of the service for a specific environment and use case.

russ · August 21, 2023, 10:01pm

thanks for the pointers so far - hopefully someone else will have thoughts on this too then!

Topic		Replies	Views
Temporal studying - various questions Community Support	5	1553	February 9, 2021
Suggestions to increase worker throughput Community Support	7	2092	December 10, 2020
Validation of app architecture using temporal Community Support go-sdk	10	1356	October 18, 2021
Temporal seems to hit scale wall Community Support performance	6	3530	March 29, 2024
How to get best Temporal Performance? Community Support general-impl , performance , metrics , best-practices , typescript-sdk	4	2597	January 24, 2024

Am I using Temporal for the wrong things?

Related topics