Hi,
I have some problem running temporal in production. I am noticing QPS on persistence layer is 10X from matching then it normally should be(without workflow running) and history service generate a lot Queue processor pump shut down and Error updating timer ack level for shard. I wonder what could be the potential issue here.
{“level”:“info”,“ts”:“2021-07-29T19:13:33.256Z”,“msg”:“Queue processor pump shut down.”,“service”:“history”,“shard-id”:162,“address”:“10.32.11.179:7934”,“shard-item”:“0xc018b0a400”,“component”:“visibility-queue-processor”,“logging-call-at”:“queueProcessor.go:248”}
{“level”:“info”,“ts”:“2021-07-29T19:13:33.256Z”,“msg”:“Task processor shutdown.”,“service”:“history”,“shard-id”:162,“address”:“10.32.11.179:7934”,“shard-item”:“0xc018b0a400”,“component”:“visibility-queue-processor”,“logging-call-at”:“taskProcessor.go:145”}
{“level”:“info”,“ts”:“2021-07-29T19:13:33.256Z”,“msg”:“none”,“service”:“history”,“shard-id”:162,“address”:“10.32.11.179:7934”,“shard-item”:“0xc018b0a400”,“component”:“visibility-queue-processor”,“lifecycle”:“Stopped”,“component”:“transfer-queue-processor”,“logging-call-at”:“queueProcessor.go:178”}
{“level”:“info”,“ts”:“2021-07-29T19:13:33.256Z”,“msg”:“none”,“service”:“history”,“shard-id”:162,“address”:“10.32.11.179:7934”,“shard-item”:“0xc018b0a400”,“component”:“history-engine”,“lifecycle”:“Stopped”,“logging-call-at”:“historyEngine.go:375”}
{“level”:“info”,“ts”:“2021-07-29T19:13:33.256Z”,“msg”:“none”,“service”:“history”,“shard-id”:162,“address”:“10.32.11.179:7934”,“shard-item”:“0xc018b0a400”,“lifecycle”:“Stopped”,“component”:“shard-engine”,“logging-call-at”:“controller_impl.go:462”}
{“level”:“info”,“ts”:“2021-07-29T19:13:33.264Z”,“msg”:“Close shard”,“service”:“history”,“shard-id”:1124,“address”:“10.32.11.179:7934”,“shard-item”:“0xc01346f680”,“logging-call-at”:“context_impl.go:807”}
{“level”:“error”,“ts”:“2021-07-29T19:13:33.264Z”,“msg”:“Error updating timer ack level for shard”,“service”:“history”,“shard-id”:1124,“address”:“10.32.11.179:7934”,“shard-item”:“0xc01346f680”,“component”:“timer-queue-processor”,“cluster-name”:“active”,“error”:“Failed to update shard. Previous range ID: 14395; new range ID: 14396”,“logging-call-at”:“timerQueueAckMgr.go:402”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/history.(*timerQueueAckMgrImpl).updateAckLevel\n\t/temporal/service/history/timerQueueAckMgr.go:402\ngo.temporal.io/server/service/history.(*timerQueueProcessorBase).internalProcessor\n\t/temporal/service/history/timerQueueProcessorBase.go:319\ngo.temporal.io/server/service/history.(*timerQueueProcessorBase).processorPump\n\t/temporal/service/history/timerQueueProcessorBase.go:194”}