Prevent Traffic Leak to Passive Temporal server connected to same database

Vikas_NS · January 18, 2022, 9:59am

Use case

We are following active/passive model.
Load balancer is configured to route traffic to only active server. In case of a failure, it will be flipped to passive.
However, both will be running and will be connected to the same database.

Expected behaviour

Given that the traffic won’t be routed to the passive node, passive node should not participate in any transactions/workflows.

Observer behaviour

The passive server seems to be involved in processing some workflows occasionally.

Questions

Could this be because both active and passive are connected to same database.?

maxim · January 18, 2022, 4:49pm

The temporal cluster doesn’t support the “active/passive” model you described. It is expected that all front ends receive traffic and all service hosts are always connected to a database.

Vikas_NS · January 18, 2022, 4:57pm

Thanks maxim.

If that’s the case, then we will run only the active server and bring up passive server when active goes down.

With that said, I’m interested in knowing why this model is not support and what are the possible implication if we follow it.

maxim · January 18, 2022, 4:58pm

Would you explain why you want to run a single server instead of two for availability?

Vikas_NS · January 20, 2022, 3:06pm

We ran into issues with Active/Active Model.

The workers would move between the servers.
Some times all the worker would move to one server leaving the other server with no worker.
When ever the request would go the server without workers, It would get stuck waiting for worker. In case of queries, it would time out.

I believe this issue stems from the fact that Load balancer’s aren’t smart enough to distribute the workers equally between the servers. This is because the see everything (requests, queries, workers) as just traffic.

Any suggestions.?

maxim · January 20, 2022, 5:26pm

I don’t have specific suggestions besides fixing your load balancer setup. The problem is clearly not with Temporal itself, so fixing it by doing unnatural deployment hacks is not the way I would recommend.

Vikas_NS · February 8, 2022, 5:53pm

In this example,

Each Pod is a temporal server. The traffic to these pods are load balanced by the standard kubernetes service.

Can it so happen that all the workers could long poll to one of the Pod (ie: temporal server) at the same instance and the Query methods goes to the POD3 and fails with no worker avaiable?

Also, does each temporal server instance (pod) have its own queue or all of them point to the same queue ?
Meaning, I get query to Pod3 , it creates the task in the queue, can POD1 fulfill the task ?

Topic		Replies	Views
Configure active-active and active-failover Temporal clusters Server Deployment multicluster , deployment , server	12	1280	July 11, 2023
Multi-datacentre issue Community Support deployment	1	632	April 13, 2022
Run temporal cluster with all services in a high available fashion on 2 different servers Community Support	0	346	August 18, 2023
Polling worker / server Community Support	4	1297	March 10, 2023
Active-active deployment of temporal services on multiple kubernetes cluster cross DC Community Support deployment	1	1557	November 16, 2022

Prevent Traffic Leak to Passive Temporal server connected to same database

Related topics