History service errors

Hello everyone,

I have noticed our logs storage metrics were going high this week. By looking at the logs, we’re seeing lots of errors related to the history service.

According to the service_error_with_type metric, it is related to NamespaceNotFound errors.

Sample log line
{"level":"error","ts":"2023-06-23T07:28:52.134Z","msg":"encounter error when describing the mutable state","service":"worker","error":"Namespace b9dc90d2-2c4f-4101-89fd-f40799da9b17 is not found.","wf-namespace-id":"b9dc90d2-2c4f-4101-89fd-f40799da9b17","wf-id":"temporal-sys-scheduler:[redacted]","wf-run-id":"c5b3059b-8d6b-4cca-af6d-da1beadc98db","wf-branch-token":"CiRjNWIzMDU5Yi04ZDZiLTRjY2EtYWY2ZC1kYTFiZWFkYzk4ZGISJGIyNTJmZTY0LTFiMjktNGVlOC1hMjBiLThjOWUyMWEyNGYzYw==","logging-call-at":"scavenger.go:284","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/service/worker/scanner/history.(*Scavenger).handleTask\n\t/home/builder/temporal/service/worker/scanner/history/scavenger.go:284\ngo.temporal.io/server/service/worker/scanner/history.(*Scavenger).taskWorker\n\t/home/builder/temporal/service/worker/scanner/history/scavenger.go:214"}

I believe it is not really important as it seems to be targeting a pre-production namespace.
Our logs have been rotating but I am wondering if there is an action to take on our side to clean the history service or something else ? The Temporal server has been restarted but it is still logging these errors.

Thank you for your help !

NamespaceNotFound

Can you run
tctl adm cl d

and check the service ips in rings? Basically see if by chance you have multiple clusters joining each other due to some misconfiguration.