NOT_FOUND: Namespace id "xxxxxx-xxxxx-xxxxx-xxxxx-xxxxxxxxxxxx" not found

temporal version: 0.13.0
DB: cassandra

All workers fail with this message No workflows can start.

NOT_FOUND: Namespace id "3d01ee86-2d70-4018-a9a4-0735030dfcd8" not found

The namespace in question actually exists.
I can see it using tctl

Name: REDACTED
Id: 3d01ee86-2d70-4018-a9a4-0735030dfcd8

temporal-frontend has a bunch of errors like this:

{"level":"error","ts":"2021-11-05T17:09:11.872Z","msg":"Unable to call matching.PollWorkflowTaskQueue.","service":"frontend","wf-task-queue-name":"/_sys/1@builder-7465c5bd7-xftrq:699f0cbe-14bd-4bc3-
be03-89d5aac6b82a/3","timeout":"1m9.999731499s","error":"Namespace id \"3d01ee86-2d70-4018-a9a4-0735030dfcd8\" not found","logging-call-at":"workflowHandler.go:812","stacktrace":"go.temporal.io/serv
er/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).PollWorkflowTaskQueue\n\t/temporal/service/frontend/workflowHand
ler.go:812\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQueue.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:532\ngo.temporal.io/server/service/fr
ontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:118\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).PollWorkflowTaskQue
ue\n\t/temporal/service/frontend/dcRedirectionHandler.go:528\ngo.temporal.io/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.5.1-0.20
211018190919-a5f4a169cd08/workflowservice/v1/service.pb.go:1088\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:152\ngoogle.go
lang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1113\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\
t/temporal/common/rpc/interceptor/namespace_count_limit.go:89\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/serv
er/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:87\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/
mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:83\ngoogle.golang.
org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/comm
on/rpc/interceptor/telemetry.go:108\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/metrics.NewServe
rMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ng
o.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:131\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/serve
r.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_logger.go:83\ngoogle.golang.org/grpc.chainUnaryInterceptors
.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1118\ngo.temporal.i
o/api/workflowservice/v1._WorkflowService_PollWorkflowTaskQueue_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.5.1-0.20211018190919-a5f4a169cd08/workflowservice/v1/service.pb.go:1090\ngoogle.golang.or
g/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1279\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1
608\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:923"}
1 Like

@tihomir could we please get some Temporal team engagement here?

Thanks,
Artem

Hi @Artem_Kazakov sure thing, on it :slight_smile:

1 Like

Namespace cache has 10s delay. If you just create your namespace, it would take about 10s for server to pick up that new namespace.

@Yimin_Chen thanks for the response but this is unlikely to be a caching issue. it fails and continues to fail each time the worker polls

1 Like

And to add to this, the namespaces existed for days/weeks before the issue appeared and suddenly ALL namespaces / ALL workers have this problem. Literally every single worker is getting this error.

Are you by chance running multiple separate Temporal clusters (for example within a single k8s cluster)? Take a look at this issue to see if there is anything relatable to your setup.

1 Like

It seems the frontend was able to find the namespace and use its namespace UUID, but matching seem to not able to find the namespace by its UUID. Is there any interesting logs from matching host? And yes, please do check if you have the multiple clusters joining same membership ring issue as Tihomir mentioned above, that usually is the root cause of a lot of weird issues.

1 Like

That is likely it. We are running multiple instances of temporal in the same cluster. Will update the thread once we confirm.

1 Like

we’re using network policies in our kubernetes cluster to prevent communication between temporal servers in different namespaces and this seems to have fixed the issues. thanks a lot for pointing us in the right direction here

1 Like

@tihomir @Yimin_Chen Is temporal designed to be multi-tenant on a kubernetes clusters? How are they run in temporal cloud? We looked at the ringpop service and it seems like it’s not using any kubenetes level abstractions.

1 Like