@maxim we have been using coinbase ruby SDK in production for sometime, can you help us here?
I will add the background here. we have an workflow and activity.
Our activity has start_to_close: 1 hour, no heartbeat timeout set, but activities has capability to send heartbeat at every 30 seconds.
On average each activity will take 12 mins to complete the activity worst case it can go till 20 mins, our code inside activity is handled in a very good idempotency.
But once in a while we are getting an exception, we could not able to find the exact root cause of this issue. could you help here?
One pattern we are seeing is this cancellation is happening within the 1-2 minute of activity started
AGRPC::Cancelledoccured in background
Exception
1:context canceled. debug_error_string:{UNKNOWN:Error received from peer {created_time:“2025-01-11T07:21:59.65124422+00:00”, grpc_status:1, grpc_message:“context canceled”}}
Wanted to add one more aspect, we have been using self hosted temporal. I found that context cancelation is due to the grpc call timeouts. This could be due to the load in temporal cluster.
I will track the temporal cluster metrics, parallely is there any way to increase the grpc timeout limits?
I also debugged in multiple aspects, but this grpc cancellation is nothing to do with Ruby SDK, when I checked the temporal cluster metrics which we maintain in grafana, I found the number workflows/tasks handled by the cluster was very high at this time. The probability of getting this GRPC cancellation request during this time is high based on the previous failures.