We have 20,000–60,000+ scheduled workflows, all configured to run every 15 minutes. All jobs on the same interval wake up simultaneously at :00, :15, :30, :45 — so this is a bursty load,
not spread evenly.
Workflow characteristics
These are not simple fire-and-forget tasks. Each workflow has:
- Multiple steps
- Readiness checks that block — the workflow must wait if a dependency isn’t ready, then resume when it is
- Some workflows trigger further downstream processing
This multi-step + durable wait behavior is why we’re evaluating Temporal over simpler task queues.
Our planned setup
- Persistence backend: PostgreSQL (not Cassandra)
Our concerns
- Postgres throughput ceiling — benchmarks suggest ~50–200 workflow starts/sec on Postgres. At that rate, 60k workflows would take 5–20 minutes just to start, leaving little time for
processing before the next 15-min cycle. - Thundering herd on the DB — 60k simultaneous starts means massive burst writes to history_node and executions tables.
- Jitter may not help enough — with a 15-min cycle, adding 5+ min of jitter defeats the purpose of the schedule.
Questions
- Has anyone run Temporal + Postgres at this scale (60k bursty scheduled workflows)? What was your experience?
- Is there a recommended pattern for bursty cron-style workloads beyond jitter?
- Is Postgres the wrong choice here — should we commit to Cassandra from the start?
- Any guidance on history shard count and Postgres sizing for this scale?
Any real-world experience or architectural guidance appreciated.