Unable error when doing add-search-attribute

Hi, I’m currently testing Temporal using Cassandra and ES.

I use the below command to add search attribute.

tctl --tls_ca_path=${PATH} --address=${ADDRESS} admin cl asa --name CustomWorkflowId --type Keyword

And I’m facing the below error

Error: Unable to get existing search attributes.
Error Details: rpc error: code = Unavailable desc = unable to get temporal-sys-add-search-attributes-workflow workflow state: context deadline exceeded
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)

The error of the temporal cluster is as follows.

{"level":"error","ts":"2021-12-06T07:17:09.335Z","msg":"unavailable error","service":"frontend","error":"unable to get temporal-sys-add-search-attributes-workflow workflow state: context deadline exceeded","logging-call-at":"adminHandler.go:1163","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/frontend.(*AdminHandler).error\n\t/temporal/service/frontend/adminHandler.go:1163\ngo.temporal.io/server/service/frontend.(*AdminHandler).GetSearchAttributes\n\t/temporal/service/frontend/adminHandler.go:283\ngo.temporal.io/server/api/adminservice/v1._AdminService_GetSearchAttributes_Handler.func1\n\t/temporal/api/adminservice/v1/service.pb.go:642\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:152\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1113\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceCountLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_count_limit.go:89\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceRateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_rate_limit.go:87\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:83\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:108\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:131\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_logger.go:83\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1118\ngo.temporal.io/server/api/adminservice/v1._AdminService_GetSearchAttributes_Handler\n\t/temporal/api/adminservice/v1/service.pb.go:644\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1279\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1608\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:923"}
{"level":"error","ts":"2021-12-06T07:17:09.335Z","msg":"unavailable error","operation":"GetSearchAttributes","error":"unable to get temporal-sys-add-search-attributes-workflow workflow state: context deadline exceeded","logging-call-at":"telemetry.go:187","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).handleError\n\t/temporal/common/rpc/interceptor/telemetry.go:187\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:117\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:131\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc/interceptor.(*NamespaceLogInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/namespace_logger.go:83\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1118\ngo.temporal.io/server/api/adminservice/v1._AdminService_GetSearchAttributes_Handler\n\t/temporal/api/adminservice/v1/service.pb.go:644\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1279\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1608\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:923"}

The ES connection seems fine, as the Workflow I started is indexed in ES.
Can you help me to solve this problem?

Thanks!

add-search-attributes command requires worker service to run. Apparently it is not running or is not configured properly. Server starts internal workflow temporal-sys-add-search-attributes-workflow and then waits for it to complete, which apparently never happens.
You can run admin get-search-attributes to get the status of this system workflow (I am pretty sure it is Timedout) and check your worker service configuration.

Thank you alex

I’ve checked two things as per your advice.

  1. admin get-search-attributes causes a timeout.
Error: Unable to get search attributes.
Error Details: rpc error: code = Unavailable desc = unable to get temporal-sys-add-search-attributes-workflow workflow state: context deadline exceeded
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)
  1. check worker service
    When I checked the cluster with the command below, the worker service was registered correctly.
$tctl --tls_ca_path=${PATH} --address=${ADDRESS} admin cl describe

...
"membershipInfo": {
    "currentHost": {
      "identity": "x.x.x.144:7233"
    },
    "reachableMembers": [
      "x.x.x.184:6934",
      "x.x.x.144:6933",
      "x.x.x.186:6939",
      "x.x.x.40:6935"
    ],
    "rings": [
      {
        "role": "frontend",
        "memberCount": 1,
        "members": [
          {
            "identity": "x.x.x.144:7233"
          }
        ]
      },
      {
        "role": "history",
        "memberCount": 1,
        "members": [
          {
            "identity": "x.x.x.184:7234"
          }
        ]
      },
      {
        "role": "matching",
        "memberCount": 1,
        "members": [
          {
            "identity": "x.x.x.40:7235"
          }
        ]
      },
      {
        "role": "worker",
        "memberCount": 1,
        "members": [
          {
            "identity": "x.x.x.186:7239"
          }
        ]
      }
    ]
  }

And I didn’t find any errors in the log of worker service.

...
{"level":"info","ts":"2021-12-06T11:27:14.082Z","msg":"worker starting","service":"worker","component":"worker","logging-call-at":"service.go:148"}
{"level":"info","ts":"2021-12-06T11:27:14.082Z","msg":"RuntimeMetricsReporter started","service":"worker","logging-call-at":"runtime.go:154"}
{"level":"info","ts":"2021-12-06T11:27:14.087Z","msg":"Membership heartbeat upserted successfully","service":"worker","address":"x.x.x.186","port":6939,"hostId":"75ef0bd4-5687-11ec-aeac-762b44e05269","logging-call-at":"rpMonitor.go:222"}
{"level":"info","ts":"2021-12-06T11:27:15.097Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"frontend","addresses":["x.x.x.144:7233"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-12-06T11:27:15.097Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"matching","addresses":["x.x.x.40:7235","x.x.x.73:7235"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-12-06T11:27:15.097Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"history","addresses":["x.x.x.184:7234","x.x.x.125:7234"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-12-06T11:27:15.098Z","msg":"Current reachable members","service":"worker","component":"service-resolver","service":"worker","addresses":["x.x.x.113:7239","x.x.x.186:7239"],"logging-call-at":"rpServiceResolver.go:266"}
{"level":"info","ts":"2021-12-06T11:27:15.153Z","msg":"Service resources started","service":"worker","address":"x.x.x.186:7239","logging-call-at":"resourceImpl.go:398"}
...
{"level":"info","ts":"2021-12-06T11:27:15.339Z","msg":"Started Worker","Namespace":"temporal-system","TaskQueue":"temporal-sys-add-search-attributes-task-queue","WorkerID":"14@odf-temporal-worker-7c8d5fd59f-bvz87@","logging-call-at":"addsearchattributes.go:86"}
{"level":"info","ts":"2021-12-06T11:27:15.339Z","msg":"worker started","service":"worker","component":"worker","logging-call-at":"service.go:170"}
...

I don’t have a custom configuration for the worker service. What do I need to check further?

Ok. Two questions:

  1. What is your Temporal version (it is in admin cl describe output few lines above) ?
  2. What do you have in your static config file (docker.yaml) in publicClient section? It should be something like this:
    publicClient:
        hostPort: x.x.x.144:7233
    
    or your DNS name frontend:7233.

Frontend can’t connect to itself using SDK client which uses publicClient/hostPort setting from dynamic config.

  1. What is your Temporal version?
    => server version = 1.13.1
  2. What do you have in your static config file in publicClient section?
    =>
// Accessing FRONTEND via DNS is verified.
publicClient:
    hostPort: FRONTEND_DNS_NAME:7233

How did you verify access? My only idea is that FRONTEND_DNS_NAME is not accessible from frontend node itself. Can you shell into frontend node and run:

tctl --address FRONTEND_DNS_NAME:7233 cluster health

If it gives you:

temporal.api.workflowservice.v1.WorkflowService: SERVING

then I am running out of ideas. But my guess that it will give you some error.

I’ve run the command below.

$ tctl --tls_ca_path=${CA_PATH} --address=${FRONTEND_DNS_NAME}:7233 cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING

I share some strange phenomena.

// This command returns a response.
$ tctl --tls_ca_path=${CA_PATH} --address=${FRONTEND_DNS_NAME}:7233 cluster get-search-attributes
Search attributes:
+-----------------------+----------+
|         NAME          |   TYPE   |
+-----------------------+----------+
...
| TemporalChangeVersion | Keyword  |
| WorkflowId            | Keyword  |
| WorkflowType          | Keyword  |
+-----------------------+----------+

// admin command doesn't work.
$ tctl --tls_ca_path=${CA_PATH} --address=${FRONTEND_DNS_NAME}:7233 adm cluster get-search-attributes
Error: Unable to get search attributes.
Error Details: rpc error: code = Unavailable desc = unable to get temporal-sys-add-search-attributes-workflow workflow state: context deadline exceeded
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)

What difference does the admin command make?

Non-admin version just returns search attributes from server metadata. Admin version also queries Elasticsearch index schema and (what is actually fail for you) system workflow status. So this makes perfect sense.

My next guess is TLS configuration. SDK client uses TLS configuration from the global.tls.frontend.client section. Is it configured properly on your frontend nodes? At least RootCAs needs to be set there since you pass it to tctl.

Hi Alex, sorry for being late.
I checked the doc and cluster settings regarding TLS.

TLS is disabled in my settings(disableHostVerification = true, alb = https).
Do I still need global TLS settings in this case?

@alex I found the cause.

I have a cluster with the same setting dik mentioned.

simple architecture:
SDK clinet → AWS ALB ( turn on https) → temporal frontend ( no tls)

settings:
temporal frontend → public client is ALB endpoint.
temporal worker → public client is ALB endpoint.

If the add search attribute command is executed in tctl with the above settings, an error occurs.
However, it was confirmed that if I changed the setting as below and executed the command, it was successful.

settings:
temporal frontend → public client is frontend server’s own IP.
temporal worker → public client is ALB endpoint.

My guess is that the ALB is set to https, but the frontend server is sending requests to http.

Looking at the code above, I don’t understand why adminHandler sends a request back to another frontend server through the sdk client. so please check it.

Thanks,
Victor.

Thanks Victor,

I can confirm, was experiencing the same issue on minikube with helm temporal templates The tamplates are modified so I cannot say for sure currently, if it’s the same with default helm templates, i.e. if it’s my issue.

Using different config for frontend with public client set to local IP worked.

Cheers

temporal frontend → public client is frontend server’s own IP.
temporal worker → public client is ALB endpoint.

This is weird. SDK clients in both FE and worker are created using the same code. So if SDK client from worker was able to connect to ALB using https, FE should be able too. I guess it might be some issue with routing configuration on different hosts.

Also need to mention that specifying server IP in config won’t work in many environments because node might get another IP after restart.

I would go deeper and try to understand why ALB is not reachable from FE node.

1 Like

hi @Alex what can be done in above case. I am also getting same error as @dik in my setup on Minikube.