Temporal history/matching service tuning

I’ve performed simple load testing on Temporal using this tool: GitHub - temporalio/maru: Benchmarks for Temporal workflows.
After some resource adjustments and dynamic config changes, I stuck again with the following errors:

#1
temporal-history-687b95f5d6-xrfn9 temporal-history {"level":"error","ts":"2021-08-05T15:03:39.105Z","msg":"Fail to process task","service":"history","shard-id":117,"address":"10.218.72.12:7234","shard-item":"0xc001660f00","component":"transfer-queue-processor","cluster-name":"active","shard-id":117,"queue-task-id":13641848,"queue-task-visibility-timestamp":"2021-08-05T15:03:35.466Z","xdc-failover-version":0,"queue-task-type":"TransferWorkflowTask","wf-namespace-id":"63ea16ad-36a4-4999-8ccc-23e68d350d53","wf-id":"basic-workflow-12-0-361-233","wf-run-id":"44adc9dd-c4d5-476c-878c-1cafaf43296c","error":"context deadline exceeded","lifecycle":"ProcessingFailed","logging-call-at":"taskProcessor.go:332","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:332\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:218\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:244\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:167"}



#2
temporal-history-76dd994d46-dlj9k temporal-history {"level":"error","ts":"2021-08-05T14:19:23.531Z","msg":"Fail to process task","service":"history","shard-id":192,"address":"10.218.73.16:7234","shard-item":"0xc001c69e80","component":"timer-queue-processor","cluster-name":"active","component":"timer-queue-processor","shard-id":192,"queue-task-id":5286601,"queue-task-visibility-timestamp":"2021-08-05T14:16:50.259Z","xdc-failover-version":0,"queue-task-type":"ActivityRetryTimer","wf-namespace-id":"63ea16ad-36a4-4999-8ccc-23e68d350d53","wf-id":"11","wf-run-id":"4541e90e-023d-4a10-9308-88d2c0119209","error":"context deadline exceeded","lifecycle":"ProcessingFailed","logging-call-at":"taskProcessor.go:332","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:332\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:218\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:244\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:167"}

I have configured 5 workers with the following options:

|||MaxConcurrentWorkflowTaskPollers: 100|
|||MaxConcurrentActivityTaskPollers: 500|
|||MaxConcurrentWorkflowTaskExecutionSize: 256,|
|||MaxConcurrentLocalActivityExecutionSize: 256,|
|||MaxConcurrentActivityExecutionSize: 256,|

I changed these parameters, but it doesn’t help a lot:

    matching.numTaskqueueReadPartitions:
    - value: 1000
      constraints: {}
    matching.numTaskqueueWritePartitions:
    - value: 1000
      constraints: {}

Would you please help me to understand where a bottleneck is?

I will appreciate any tips.

1 Like