Workflow process slowness

Hi Team,

We have a workflow and child workflows to be process as per our business use case.
The child workflow to trigger tooks 2 mins delay.
We are using the Cockroach database as backend.
We have deployed to non-prod & prod environments.
non-prod looks good but slowness in prod enviornment.
we do have similar configuration on both environment & database.

Is there a reason of slowness? any configuration required?

See the attachment.

Do you have sdk and server metrics we can look at?

From sdk side:

histogram_quantile(0.95, sum(rate(temporal_workflow_task_schedule_to_start_latency_seconds_bucket{namespace=~\"$Namespace\"}[5m])) by (namespace, le))

Server side:

sum(rate(poll_success_sync{}[1m])) / sum(rate(poll_success{}[1m]))

Server wrote the ChildWorkflowExecutionInitiated even in history as soon as the worker gave it that command. Then it took time for ChildWorkflowExecutionStarted even which is persisted in event history when your workflow worker picks it up. My guess is possible latencies on your workers side so looking at these metrics around that time would be helpful.

Also always good to look at (during that time):

persistence latencies:
histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))

resource exhausted issues:
sum(rate(service_errors_resource_exhausted{}[1m])) by (operation, resource_exhausted_cause)

Hi @tihomir
I dont have sdk and server metrics how to setup?, but we have the prometheus scrap the metrics.

is the link broken?

Try link here.

I dont have sdk and server metrics how to setup?

Which SDK do you use? How do you deploy your Temporal service?

I am using java sdk 1.17 and temporal server v1.18.3
Services Deployed in k8s.

What does dashboard json file how it helps?

Dashboards repo includes out of box Grafana dashboards that you can use as starting point for your sdk and server metrics monitoring.

How to integrate with temporal server?
Do you instructions?