Does multicluster replication internally uses SignalWorkflowExecution operation in any way?

Pavithra_M · February 20, 2025, 6:30pm

Hi Team, we have recently setup multicluster environment and replicated data from one cluster to another with failover. We observed that after failover replication, there is high CPU Usage and on further investigating, we saw high number of SignalWorkflowExecution operation requests and they are failing with serviceerror_NotFound. Also we re seeing resource exhausted errors with cause as BusyWorkflow. We want to understand if replication under the hood use Signals in any way? If not can you please help us understand what could be the cause or any pointers to look further into.

tihomir · February 22, 2025, 8:47pm

We want to understand if replication under the hood use Signals in any way?

No, it doesnt. Passive cluster shards poll active cluster for replication tasks, but yeah not signal api used.

Do you have client and sdk worker metrics configured? If you do check which pod/containers sees high rate of temporal_request_failure metric for operation SignalWorkflowExecution, maybe that could give you idea of who is sending the signals. If you have an LB/proxy configured between your client/workers and your frontend services would look at its logs to see if it logs ip address of called for SignalWorkflowExecution api calls where service returns not_found grpc code in response

Topic		Replies	Views
Completed Workflows not replicating in multi-cluster set-up when one of cluster is Rebuilt Server Deployment	3	248	March 18, 2025
Multicluster replication with three clusters Server Deployment	5	56	April 24, 2025
Multi-Cluster Replication Performance Tuning Server Deployment replication	1	121	July 9, 2024
XDC Limitations and Tradeoffs Community Support java-sdk , xdc	0	563	January 12, 2022
Multi cluster replication question Community Support	0	17	March 12, 2025

Does multicluster replication internally uses SignalWorkflowExecution operation in any way?

Related topics