I’ve performed simple load testing on Temporal using this tool: GitHub - temporalio/maru: Benchmarks for Temporal workflows.
After some resource adjustments and dynamic config changes, I stuck again with the following errors:
#1
temporal-history-687b95f5d6-xrfn9 temporal-history {"level":"error","ts":"2021-08-05T15:03:39.105Z","msg":"Fail to process task","service":"history","shard-id":117,"address":"10.218.72.12:7234","shard-item":"0xc001660f00","component":"transfer-queue-processor","cluster-name":"active","shard-id":117,"queue-task-id":13641848,"queue-task-visibility-timestamp":"2021-08-05T15:03:35.466Z","xdc-failover-version":0,"queue-task-type":"TransferWorkflowTask","wf-namespace-id":"63ea16ad-36a4-4999-8ccc-23e68d350d53","wf-id":"basic-workflow-12-0-361-233","wf-run-id":"44adc9dd-c4d5-476c-878c-1cafaf43296c","error":"context deadline exceeded","lifecycle":"ProcessingFailed","logging-call-at":"taskProcessor.go:332","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:332\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:218\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:244\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:167"}
#2
temporal-history-76dd994d46-dlj9k temporal-history {"level":"error","ts":"2021-08-05T14:19:23.531Z","msg":"Fail to process task","service":"history","shard-id":192,"address":"10.218.73.16:7234","shard-item":"0xc001c69e80","component":"timer-queue-processor","cluster-name":"active","component":"timer-queue-processor","shard-id":192,"queue-task-id":5286601,"queue-task-visibility-timestamp":"2021-08-05T14:16:50.259Z","xdc-failover-version":0,"queue-task-type":"ActivityRetryTimer","wf-namespace-id":"63ea16ad-36a4-4999-8ccc-23e68d350d53","wf-id":"11","wf-run-id":"4541e90e-023d-4a10-9308-88d2c0119209","error":"context deadline exceeded","lifecycle":"ProcessingFailed","logging-call-at":"taskProcessor.go:332","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:332\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:218\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:244\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:167"}
I have configured 5 workers with the following options:
|||MaxConcurrentWorkflowTaskPollers: 100|
|||MaxConcurrentActivityTaskPollers: 500|
|||MaxConcurrentWorkflowTaskExecutionSize: 256,|
|||MaxConcurrentLocalActivityExecutionSize: 256,|
|||MaxConcurrentActivityExecutionSize: 256,|
I changed these parameters, but it doesn’t help a lot:
matching.numTaskqueueReadPartitions:
- value: 1000
constraints: {}
matching.numTaskqueueWritePartitions:
- value: 1000
constraints: {}
Would you please help me to understand where a bottleneck is?
I will appreciate any tips.