Planning a production deployment

I am in the process of setting up a production deployment of Temporal, and I have a few questions.

  • Is there an acceptable amount of latency between server nodes? We have multiple DCs connected with private fiber, and I’m wondering if I can have a single cross-DC cluster, or if I should be designing this as a multiple-cluster setup using the replication/failover functionality. Is there any rule of thumb here?
  • Is Kafka still required for the replication system? Is this system still considered experimental?

what is the latency between physical DCs?
do you have absolute strong requirement for availability, event if it means losing consistency?

NOTE:

  1. kafka is not required for cross DC
  2. cross DC is still experimental

I’m going to work on getting better numbers, but for now, I would assume 30ms would be the absolute worst case scenario, and in most cases we’d be seeing <10ms.

Consistency is probably a higher priority for us. Is there any latency number at which we would have to worry about seeing erratic behavior in the cluster?

I’m trying to determine if a multi-cluster replication setup is something I want to try to tackle right now, given this cluster should see relatively light use out of the gate.

if latency is ~10ms then i guess it is ok to have a single Temporal cluster on top of multiple physical DCs.

About cross DC, this feature by design does not guarantee consistency, but availability. e.g. what if an entire DC is down.

  • You need to setup 2 Temporal cluster, each on top of a dedicated physical DC.
  • You also need to configure dedicated worker fleet per above Temporal cluster.
  • When a DC is down, you need to manually failover to the still functional DC.