Temporal 1.9.2 not working with Elasticsearch 7.7 - Error 400 All Shards Failed

I’ve been trying to wire up Temporal 1.9.2 to use AWS Elasticsearch 7.7. My ES cluster is up and healthy, 9 total nodes, 3 dedicated masters, plenty of headroom it seems.

In my Temporal Helm chart, I have

elasticsearch:
  enabled: false
  external: true
  host: "my-url-redacted.com"
  port: "443"
  version: "v7"
  scheme: "https"
  logLevel: "error"

This seems to connect properly. I can see the temporal-visibility-dev index auto-create by going to
https://my-url-redacted.com/_cat/indices/ in my browser. It states it is “green”

However when going to the Temporal UI, clicking any of my namespaces, I receive the error:

13 INTERNAL: ListClosedWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]

Navigating to https://my-url-redacted.com/_cluster/health/?level=shards - I see all the shards green and healthy.

In the logs of temporal-frontend, I can see the full stack trace

"msg":"Operation failed with internal error.","service":"frontend","metric-scope":242,"error":"ListOpenWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]","logging-call-at":"visibility_manager_metrics.go:276","stacktrace":"go.temporal.io/server/common/log.

Unsure if related, but in the default index that generates, the fields do not seem to exactly match any of these - temporal/schema/elasticsearch/visibility at 7c89d5f19cf139514ef4c50ef9ba2e5a9c869d71 · temporalio/temporal · GitHub

I tried deleting the index and recreating using these, but the index seems to auto-generate itself in the meantime, back to what its default was.

Is this a bug? Something I am missing?

Thank you

Sounds like the same situation as this thread - so I manually curled like this. The index looks correct now, I see 200’s from my ES console in AWS. However the UI still returns the same shard error even though all looks well.

Did you curled those commands after index was autocreated? If yes, it doesn’t affect existing index. So you need to delete existing index, create template, and then create index manually. After that it should work. I would also suggest you to disable index autocreation.

Btw, we released 1.11 yesterday with significant improvements and changes in advanced visibility (backed by Elasticsearch). I would suggest you to start with it to avoid reindexing in future migration.

Thanks for the reply.
Yes, had done it after the index autocreated - but also tried deleting and recreating. But it was always getting overwritten by the auto-creator back to the incorrectly formatted index.

Either way, yes, I am moving ahead with 1.11.1 upgrade now and then will try rewiring up Elasticsearch.

Thanks