Elasticsearch error for visibility

I get the following error when I try to use Elasticsearch for visibility.

{"level":"error","ts":"2021-04-06T19:42:42.447Z","msg":"Internal service error","service":"frontend","error":"ListOpenWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]","logging-call-at":"workflowHandler.go:3406","stacktrace":"go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).error\n\t/temporal/service/frontend/workflowHandler.go:3406\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowHandler.go:2424\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:367\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:116\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/dcRedirectionHandler.go:363\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.0/workflowservice/v1/service.pb.go:1389\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1051\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1037\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.0/workflowservice/v1/service.pb.go:1391\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1210\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1533\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:871"}

I am using Elasticsearch version 7.9.2. Have any one else faced similar issue?

This is most likely ES version mismatch. How do you run Temporal? docker-compose files are configured to use v7 but docker image itself use v6 by default. ES_VERISION env needs to be set to v7 similar to this. Also proper index template needs to be used for v7.

Thanks, @alex.

I am running Temporal using the Helm Chart from GitHub - temporalio/helm-charts: Temporal Helm charts. I used the command

helm install \
    -f values/values.elasticsearch.yaml \
    --set server.replicaCount=1 \
    --set cassandra.config.cluster_size=1 \
    --set prometheus.enabled=false \
    --set grafana.enabled=false \
    --set kafka.enabled=false \
    --set server.kafka.host=kafka-host:9092 \
    temporaltest . --timeout 15m 

I can also see that the ES_VERSION is set to v7 (Helm Chart installation does that automatically).

Please let me know if you see any problem with the steps I am following.

Also, do I need to follow any additional step to use the index template you have mentioned above?

Are you using external ES or want to deploy ES using helm? If you want to deploy it you should not use -f values/values.elasticsearch.yaml because this file configures cluster to use ES which is already deployed in the same kube cluster (don’t ask me why). And if you just remove this line, it will use defaults from values.yaml which will deploy ES and create index.

You don’t need to do anything manually to create index or its template.

Hi @alex, I am using an external ES cluster and have configured the details in values.elasticsearch.yaml before running helm install.

I can also see the index temporal-visibility-dev created in the external ES. When I execute a workflow, I can see an entry like the following getting inserted in ES:

{
“_index” : “temporal-visibility-dev”,
“_type” : “_doc”,
“_id” : “temporal-sys-history-scanner~6295c6bb-3de6-4d97-9faa-0d7ab69ebf67”,
“_score” : 1.0,
“_source” : {
“Attr” : { },
“CloseTime” : 1617883200978285235,
“ExecutionStatus” : 6,
“ExecutionTime” : 1617883200899515761,
“HistoryLength” : 11,
“NamespaceId” : “32049b68-7872-4094-8e63-d0dd59896a83”,
“RunId” : “6295c6bb-3de6-4d97-9faa-0d7ab69ebf67”,
“StartTime” : 1617840000899515761,
“TaskQueue” : “temporal-sys-history-scanner-taskqueue-0”,
“VisibilityTaskKey” : “216~1048709”,
“WorkflowId” : “temporal-sys-history-scanner”,
“WorkflowType” : “temporal-sys-history-scanner-workflow”
}
}

However, when I try to access the Web UI, I get the error:

Also, get the following error in the temporaltest-frontend pod:

{“level”:“error”,“ts”:“2021-04-08T19:05:48.089Z”,“msg”:“Internal service error”,“service”:“frontend”,“error”:“ListClosedWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]”,“logging-call-at”:“workflowHandler.go:3406”,“stacktrace”:“go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).error\n\t/temporal/service/frontend/workflowHandler.go:3406\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).ListClosedWorkflowExecutions\n\t/temporal/service/frontend/workflowHandler.go:2539\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListClosedWorkflowExecutions.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:337\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:116\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListClosedWorkflowExecutions\n\t/temporal/service/frontend/dcRedirectionHandler.go:333\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListClosedWorkflowExecutions_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v1.4.0/workflowservice/v1/service.pb.go:1407\ngo.temporal.io/server/common/authorization.(*interceptor).Interceptor\n\t/temporal/common/authorization/interceptor.go:136\ngoogle.golang.org/grpc.getChainUnaryHandler.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1051\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:100\ngoogle.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1037\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListClosedWorkflowExecutions_Handler\n\t/go/pkg/mod/go.temporal.io/api@v1.4.0/workflowservice/v1/service.pb.go:1409\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1210\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:1533\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.34.0/server.go:871”}

Can you do GET /temporal-visibility-dev/_mapping and GET /_template/temporal-visibility-template for your ES cluster and post response here?

Hi @alex, here are the responses:

GET /temporal-visibility-dev/_mapping - 200 OK
{
      "temporal-visibility-dev" : {
        "mappings" : {
          "properties" : {
            "Attr" : {
              "properties" : {
                "BinaryChecksums" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword",
                      "ignore_above" : 256
                    }
                  }
                }
              }
            },
            "CloseTime" : {
              "type" : "long"
            },
            "ExecutionStatus" : {
              "type" : "long"
            },
            "ExecutionTime" : {
              "type" : "long"
            },
            "HistoryLength" : {
              "type" : "long"
            },
            "NamespaceId" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "RunId" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "StartTime" : {
              "type" : "long"
            },
            "TaskQueue" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "VisibilityTaskKey" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "WorkflowId" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "WorkflowType" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        }
      }
    }

GET /_template/temporal-visibility-template - 404 Not Found
{ }

Ok, this is what I suspected. For some reason you don’t have index template and it wasn’t applied when temporal-visibility-dev index was created. And this where all other problems comes from.

To fix this I would suggest you to drop your current index and recreate it with two curl commands the same way as we do it for server development.

And it would be great to investigate why did it happen. Am I right that you have:

  enabled: false
  external: true

in your values/values.elasticsearch.yaml file?

Another thing to check is ENABLE_ES env in any server container. It should be true.

Can you confirm this.

Thanks, @alex.

I have verified that ENABLE_ES is true in history, matching, and frontend servers.

Below is the values/values.elasticsearch.yaml I used:

elasticsearch:
  enabled: false
  external: true
  host: "elasticsearch-host"
  port: "9200"
  version: "v7"
  scheme: "http"
  logLevel: "error"
  username: "user"
  password: "pass"

Please let me know if you see anything wrong here. I will recreate the index in the meantime.