Hello. I am currently experimenting with a Temporal deployment - self hosted in a cloud managed kubernetes cluster. I am using the Helm chart from Temporal repository with own dependecies.
Meaning I am only deploying the temporal components and use a managed Postgres instance for it to connect to.
I have some Java service in the same Kubernetes cluster that periodically executes some distributed cron jobs via Temporal. Very light scenarios, impossible to be performance issues.
For monitoring I am using an APM to monitor the whole K8s cluster and I recently noticed some strange metricc for the GRPC calls that the Java app does to communicate with Temporal.
I am seeing constant, latencies over 50 seconds, for operations such as: temporal.api.workflowservice.v1.WorkflowService/PollActivityTaskQueue
So I am not sure if it’s a problem or I am not reading something or understanding something correctly.
I also saw, no CPU, RAM pressure whatsoever for the pods, even on the DB i have on right now something as query insight. Nothing seems problematic.
There are also other operations which seems fine, so I would assume if network would be the issue, I would see more consistent latencies accross all operations.
This is by design. The worker processes receive tasks from Temporal service using long polling. PollActivityTaskQueue is one of those long poll operations.
Hi,
I understand that when a workflow is not being executed, then because of long polling, the service latency for PollActivityTaskQueue will be high. But, when I’m executing my workflows, I’m getting the following latency:
As you can see, my workflow end to end latency is ~1.5s, but the service latency is greater than 20s. What is this latency actually of? If the polling latency is so high, how is my workflow being executed in such less time?
PromQL queries used:
Workflow end to end latency: sum(rate(temporal_workflow_endtoend_latency_seconds_sum[5m]))/sum(rate(temporal_workflow_endtoend_latency_seconds_count[5m]))
Service Latency: sum(rate(service_latency_sum{operation="PollActivityTaskQueue"}[5m]))/sum(rate(service_latency_count{operation="PollActivityTaskQueue"}[5m]))