Adding a cluster using dns fails

tctl -ns mynamespace adm cl urc --frontend_address dns:///temporal-xdc-active.mydomain:7233

fails with Error: Operation AddOrUpdateRemoteCluster failed.
Error Details: rpc error: code = Unavailable desc = error reading from server: read tcp 10.x.yyy.zzz:36808->172.xx.yyy.zz:7233: read: connection reset by peer

and i see error in worker
{“level”:“warn”,“ts”:“2022-03-10T16:28:55.362Z”,“msg”:“Failed to poll for task.”,“Namespace”:“temporal-system”,“TaskQueue”:“temporal-sys-tq-scanner-taskqueue-0”,“WorkerID”:“1@myserver-temporal-worker-7b6b74cf66-h7gdc@”,“WorkerType”:“ActivityWorker”,“Error”:“error reading from server: EOF”,“logging-call-at”:“internal_worker_base.go:276”}

Where as tctl -ns mynamspace adm cl urc --frontend_address 10.x.yyy.zzz:7233 suceeds

any idea whats going wrong? i suspect there after upgrading to 1.15.2 the dns:/// possibly broke

We do not change the AddOrUpdateRemoteCluster in v1.15. From the logging mesasge, the valid address is 10.x.yyy.zzz:7233. When using dns, does it forward the request (10.x.yyy.zzz:36808->172.xx.yyy.zz:7233)? should the port be 7233?

yes 7233 is part of the url and it works pefectly fine in 1.14.x and 1.15.0 ( i upgraded from 1.14.3 to 1.15.0)

i tested this again. i am consistently reproduce this.

tctl -nsmynamespace adm cl urc --fad dns:///mycluster:7233
Error: Operation AddOrUpdateRemoteCluster failed.
Error Details: rpc error: code = Unavailable desc = error reading from server: EOF

And when this happens i see this error in my front end

{"level":"fatal","ts":"2022-03-17T16:08:52.595Z","msg":"Invalid rpcAddress for remote cluster","error":"address dns:///myaddress:7233: too many colons in address","logging-call-at":"rpc.go:268","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Fatal\n\t/temporal/common/log/zap_logger.go:150\ngo.temporal.io/server/common/rpc.(*RPCFactory).CreateFrontendGRPCConnection\n\t/temporal/common/rpc/rpc.go:268\ngo.temporal.io/server/client.(*rpcClientFactory).NewAdminClientWithTimeout.func1\n\t/temporal/client/clientfactory.go:213\ngo.temporal.io/server/common.(*clientCacheImpl).GetClientForClientKey\n\t/temporal/common/clientCache.go:94\ngo.temporal.io/server/common.(*clientCacheImpl).GetClientForKey\n\t/temporal/common/clientCache.go:75\ngo.temporal.io/server/client/admin.(*clientImpl).getRandomClient\n\t/temporal/client/admin/client.go:473\ngo.temporal.io/server/client/admin.(*clientImpl).DescribeCluster\n\t/temporal/client/admin/client.go:253\ngo.temporal.io/server/client/admin.(*metricClient).DescribeCluster\n\t/temporal/client/admin/metricClient.go:294\ngo.temporal.io/server/service/frontend.(*AdminHandler).AddOrUpdateRemoteCluster\n\t/temporal/service/frontend/adminHandler.go:1055\ngo.temporal.io/server/api/adminservice/v1._AdminService_AddOrUpdateRemoteCluster_Handler.func1\n\t/temporal/api/adminservice/v1/service.pb.go:968\ngo.temporal.io/server/common/rpc/interceptor.(*SDKVersionInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/sdk_version.go:63\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1116\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:152\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:99\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:89\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceValidatorInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_validator.go:113\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:108\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:131\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_logger.go:84\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1119\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1121\ngo.temporal.io/server/api/adminservice/v1._AdminService_AddOrUpdateRemoteCluster_Handler\n\t/temporal/api/adminservice/v1/service.pb.go:970\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1282\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1616\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:921"}

could this commit have broken the front end url parsing logic Add config and handling for remote cluster cert (#2475) · temporalio/temporal@305a6fc · GitHub

look at this test cases specfically which add “:” while parsing port temporal/rpc_common_test.go at 305a6fc97e11d754c7eb77afa406d6626c64216e · temporalio/temporal · GitHub

The error indicate invalid address. could you try address:7233? refer tls error

Hmm. could be… i think this started breaking after 1.15.x upgrade(after the grpc library upgrade in temporal), this was working fine till 1.14.x

Unfortunately, I will need the port to be mentioned because the dns address i am trying to resovle belongs. let me see if there is work around to it.

have you tried to add cluster using just cluster_name:port?

Sorrry for delayed reply, i was out of office.
Yes, i tried it today, and unfortunately that did not work either, using ip address works fine.dns:///:port was working fine in 1.4.x

i realized that url without dns:/// works, but, there are couple of thing which i had to do before making it work.

a) i had to delete clustermetadata and clustermetadatainfo table.
b) and i had to to promote my namespace again using tctl -ns mynamespace n up --pn