Java Grpc high latencies

Radu · May 20, 2021, 11:13am

Hello. I am currently experimenting with a Temporal deployment - self hosted in a cloud managed kubernetes cluster. I am using the Helm chart from Temporal repository with own dependecies.
Meaning I am only deploying the temporal components and use a managed Postgres instance for it to connect to.

I have some Java service in the same Kubernetes cluster that periodically executes some distributed cron jobs via Temporal. Very light scenarios, impossible to be performance issues.

For monitoring I am using an APM to monitor the whole K8s cluster and I recently noticed some strange metricc for the GRPC calls that the Java app does to communicate with Temporal.

I am seeing constant, latencies over 50 seconds, for operations such as: temporal.api.workflowservice.v1.WorkflowService/PollActivityTaskQueue

So I am not sure if it’s a problem or I am not reading something or understanding something correctly.
I also saw, no CPU, RAM pressure whatsoever for the pods, even on the DB i have on right now something as query insight. Nothing seems problematic.

Any insights on this one ? Thank you !

Radu · May 20, 2021, 11:27am

I am using Temporal 1.8.0.
And the cron jobs run at something like 2 min to a couple of hours depending on type.

madhu · May 20, 2021, 12:25pm

can you check the network latency of the k8s cluster/nodes

Radu · May 20, 2021, 1:59pm

There are also other operations which seems fine, so I would assume if network would be the issue, I would see more consistent latencies accross all operations.

Radu · May 20, 2021, 2:00pm

maxim · May 20, 2021, 3:42pm

This is by design. The worker processes receive tasks from Temporal service using long polling. PollActivityTaskQueue is one of those long poll operations.

Radu · May 21, 2021, 8:04am

Thank you @maxim , yes indeed make sense now

Ruchir · March 1, 2022, 3:44pm

Hi,
I understand that when a workflow is not being executed, then because of long polling, the service latency for PollActivityTaskQueue will be high. But, when I’m executing my workflows, I’m getting the following latency:

As you can see, my workflow end to end latency is ~1.5s, but the service latency is greater than 20s. What is this latency actually of? If the polling latency is so high, how is my workflow being executed in such less time?

PromQL queries used:
Workflow end to end latency: sum(rate(temporal_workflow_endtoend_latency_seconds_sum[5m]))/sum(rate(temporal_workflow_endtoend_latency_seconds_count[5m]))

Service Latency:
sum(rate(service_latency_sum{operation="PollActivityTaskQueue"}[5m]))/sum(rate(service_latency_count{operation="PollActivityTaskQueue"}[5m]))

sum(rate(service_latency_sum{operation="PollWorkflowTaskQueue"}[5m]))/sum(rate(service_latency_count{operation="PollWorkflowTaskQueue"}[5m]))

maxim · March 1, 2022, 3:52pm

It happens because each worker emits multiple poll requests simultaneously. So some of them get the tasks, some if them keep waiting.

Topic		Replies	Views
Temporal Performance Community Support java-sdk	1	289	January 31, 2024
Tuning Temporal setup for better performance Community Support cassandra , performance , kubernetes	5	8736	November 13, 2021
Temporal performance issues Community Support java-sdk , performance , worker , kubernetes	1	1794	April 26, 2023
Seeing high latencies between two subsequent activity task executions Community Support java-sdk , cassandra	22	2886	July 19, 2022
Performance test on GKE Community Support java-sdk , deployment	2	1168	May 27, 2022

Java Grpc high latencies

Related topics