Temporal 1.9.2 not working with Elasticsearch 7.7 - Error 400 All Shards Failed

Upgrader · July 19, 2021, 8:46pm

I’ve been trying to wire up Temporal 1.9.2 to use AWS Elasticsearch 7.7. My ES cluster is up and healthy, 9 total nodes, 3 dedicated masters, plenty of headroom it seems.

In my Temporal Helm chart, I have

elasticsearch:
  enabled: false
  external: true
  host: "my-url-redacted.com"
  port: "443"
  version: "v7"
  scheme: "https"
  logLevel: "error"

This seems to connect properly. I can see the temporal-visibility-dev index auto-create by going to
https://my-url-redacted.com/_cat/indices/ in my browser. It states it is “green”

However when going to the Temporal UI, clicking any of my namespaces, I receive the error:

13 INTERNAL: ListClosedWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]

Navigating to https://my-url-redacted.com/_cluster/health/?level=shards - I see all the shards green and healthy.

In the logs of temporal-frontend, I can see the full stack trace

"msg":"Operation failed with internal error.","service":"frontend","metric-scope":242,"error":"ListOpenWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]","logging-call-at":"visibility_manager_metrics.go:276","stacktrace":"go.temporal.io/server/common/log.

Unsure if related, but in the default index that generates, the fields do not seem to exactly match any of these - temporal/schema/elasticsearch/visibility at 7c89d5f19cf139514ef4c50ef9ba2e5a9c869d71 · temporalio/temporal · GitHub

I tried deleting the index and recreating using these, but the index seems to auto-generate itself in the meantime, back to what its default was.

Is this a bug? Something I am missing?

Thank you

Upgrader · July 20, 2021, 5:33pm

Sounds like the same situation as this thread - so I manually curled like this. The index looks correct now, I see 200’s from my ES console in AWS. However the UI still returns the same shard error even though all looks well.

alex · July 20, 2021, 9:39pm

Did you curled those commands after index was autocreated? If yes, it doesn’t affect existing index. So you need to delete existing index, create template, and then create index manually. After that it should work. I would also suggest you to disable index autocreation.

Btw, we released 1.11 yesterday with significant improvements and changes in advanced visibility (backed by Elasticsearch). I would suggest you to start with it to avoid reindexing in future migration.

Upgrader · July 21, 2021, 5:51pm

Thanks for the reply.
Yes, had done it after the index autocreated - but also tried deleting and recreating. But it was always getting overwritten by the auto-creator back to the incorrectly formatted index.

Either way, yes, I am moving ahead with 1.11.1 upgrade now and then will try rewiring up Elasticsearch.

Thanks

Topic		Replies	Views
[Error in UI]13 INTERNAL: ListClosedWorkflowExecutions failed. Error: elastic: Error 503 (Service Unavailable): all shards failed Community Support elasticsearch , mysql , database	5	1377	June 11, 2021
Elasticsearch error for visibilitye Community Support elasticsearch	12	3176	May 3, 2021
visibilityMetricsClient exceptions in temporal-frontend pod Community Support elasticsearch , metrics , advanced_visibility , visibility , kubernetes	7	1000	April 28, 2021
Operation not support. Please use on ElasticSearch Community Support java-sdk	5	1625	June 23, 2022
Possible Bug - 13 INTERNAL: ListClosedWorkflowExecutions failed error in WebUI Community Support web-ui	2	1310	November 24, 2021

Temporal 1.9.2 not working with Elasticsearch 7.7 - Error 400 All Shards Failed

Related topics