WorkflowTaskTimedOut observed during performance testing

We are observing WorkflowTaskTimedOut errors when multiple workflows are triggered at once.
we are using a parent and child workflow and these timeout errors are being observed in both parent and child.
workflow code is a simple code calling list of multiple activities.
Only piece of executable work we are doing in parent workflow is to get the workflow started time using
start_date_time = workflow.now().strftime(‘%Y-%m-%d %H:%M’)
end_date_time = (datetime.strptime(start_date_time, ‘%Y-%m-%d %H:%M’) + timedelta(hours=2))
.strftime(‘%Y-%m-%d %H:%M’)

we have not set any timeout and hence defaults are used.

Please help understand if we missing any set up.

Can you share full error(s) you are getting please?

Do you have sdk (worker) metrics configured? If not check samples like https://github.com/temporalio/samples-python/tree/main/prometheus for more info.

From sdk metrics look at
workflow_task_execution_latency
workflow_task_replay_latency
request_latency
request_failure

NotFound, message: “Workflow task not found.”

This is expected when the timed out workflow task response is sent by your worker.
The WorkflowTaskTimedOut timeout type is StartToClose meaning your worker is not able to respond its completion within the default timeout of 10s.

Still think it’s important to check the earlier mentioned sdk (worker) metrics as well as check your worker pods/containers cpu and mem utilization during this time. This would allow you to pinpoint something rather than guessing.

When ran in debug mode saw this error in the logs just before the workflow tasksTimed out.

DEBUG:temporalio.worker._workflow:Evicting workflow with run ID e5af6845-296d-4b15-a936-dc5bdd2274e1, message: Error reporting WFT to server
DEBUG:temporalio.worker._workflow:Evicting workflow with run ID cb857635-1cac-4562-ab1e-8ee1a85d7a9f, message: Error reporting WFT to server

any info on this error is appreciated.

Could you look at your worker metric
request_failure

you can filter it by operation and status_code. Look if you see any errors reported for operation RespondWorkflowTaskCompleted and see the status code.