Frontend crashlooping after failing to create index in ES

Chitresh_Deshpande · March 10, 2021, 10:09pm

Hi Folks

I recently deployed the latest version of Temporal with following external dependencies.
Persistence
default: aurora in AWS
visibilitu: aurora in AWS
es-vibility: Elastic Search ( 7.9 ) in AWS

Frontend seems to be constantly restarting.
When I look at “previous” container error, here’s what I see

2021/03/10 21:28:54 Loading config; env=docker,zone=,configDir=/etc/temporal/config
2021/03/10 21:28:54 Loading config files=[/etc/temporal/config/docker.yaml]
{"level":"info","ts":"2021-03-10T21:28:54.919Z","msg":"Starting server for services","value":"[frontend]","logging-call-at":"server.go:110"}
Unable to start server: sql schema version compatibility check failed: dial tcp 10.77.184.253:3306: connect: connection timed out.
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true."}],"type":"illegal_argument_exception","reason":"The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true."},"status":400}{"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"index [temporal-visibility-dev/GtAzt900S-Suqpw0rOvGzw] already exists","index_uuid":"GtAzt900S-Suqpw0rOvGzw","index":"temporal-visibility-dev"}],"type":"resource_already_exists_exception","reason":"index [temporal-visibility-dev/GtAzt900S-Suqpw0rOvGzw] already exists","index_uuid":"GtAzt900S-Suqpw0rOvGzw","index":"temporal-visibility-dev"},"status":400}%

There seems to be two issues

SQL schema version comaptaibility check failure
ES index creation failure

But to me it’s not clear what exactly is the issue.
Can you please provide some pointers on how to triage this issue?

Things to note:

Temporal did create the index successfully in it’s first try. I can see it in ES. I am not sure, why is it throwing error in subsequent attempts. Is this error a red-herring?
We did follow the temporal-sql tool instructions to create schemas in aurora db, so not sure what step are we missing here.

➜  temporal-mysql git:(cancel_calendar_wf) ✗ k get po
NAME                                               READY   STATUS             RESTARTS   AGE
temporal-admintools-58f4f7c68-48bbw                1/1     Running            0          138m
temporal-frontend-85647c697f-gsjxv                 0/1     CrashLoopBackOff   22         138m
temporal-grafana-56b45c99c8-kd8c7                  1/1     Running            0          138m
temporal-history-5979c5b4d8-sdwck                  0/1     CrashLoopBackOff   22         138m
temporal-kube-state-metrics-79bdd5c9db-ktsb4       1/1     Running            0          138m
temporal-matching-67fd59449-4cvx7                  0/1     CrashLoopBackOff   22         138m
temporal-prometheus-alertmanager-c69d5f64f-smb2p   2/2     Running            0          138m
temporal-prometheus-pushgateway-6fb4876f8b-fp5mp   1/1     Running            0          138m
temporal-prometheus-server-56b5cd5478-6p9jv        2/2     Running            0          138m
temporal-web-67875b6f5b-7ddj7                      1/1     Running            0          138m
temporal-worker-6b5c6f75f-mj67m                    0/1     CrashLoopBackOff   22         138m

Note 22 restarts

Wenquan_Xing · March 10, 2021, 11:03pm

the real error is connection timeout, can you check your DB is accessible?

alex · March 11, 2021, 12:11am

First of all, don’t use AWS ES 7.9. It is broken (unless you have dedicated deployment from AWS). Use 7.7.
Secondly, how did you deployed Temporal? Docker image is still defaulted to ES6. That _doc related error clearly points to ES version mismatch (Temporal uses schema from ES6). Our helm charts and docker-compose files are configured to use ES7 by default but not docker images itself. You need to set ES_VERSION env to v7.

Chitresh_Deshpande · March 15, 2021, 1:21pm

Hi Alex,
Thanks for your answers. I tried setting up ES_VERSION=v7 and I am still running into same issue.

  temporal-mysql git:(temporal_170_upgrade) ✗ k logs temporal-frontend-7f6cdc9fb9-mm8vz
+ DB=cassandra
+ ENABLE_ES=true
+ ES_SCHEMA_SETUP_TIMEOUT_IN_SECONDS=0
+ ES_PORT=443
+ ES_VERSION=v7
+ ES_SCHEME=https
+ ES_VIS_INDEX=temporal-visibility-dev
+ RF=1
.........
{"level":"info","ts":"2021-03-15T13:08:27.310Z","msg":"Get dynamic config","name":"history.persistenceMaxQPS","value":"3000","default-value":"3000","logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-03-15T13:08:27.331Z","msg":"Get dynamic config","name":"system.advancedVisibilityWritingMode","value":"on","default-value":"on","logging-call-at":"config.go:79"}
{"level":"info","ts":"2021-03-15T13:08:27.331Z","msg":"Get dynamic config","name":"history.visibilityQueue","value":"internal","default-value":"internal","logging-call-at":"config.go:79"}
Unable to start server: visibility index in missing in Elasticsearch config.
{"acknowledged":true}{"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"index [temporal-visibility-dev/ko8wbOCESR6-Kwf5-vcPqw] already exists","index_uuid":"ko8wbOCESR6-Kwf5-vcPqw","index":"temporal-visibility-dev"}],"type":"resource_already_exists_exception","reason":"index [temporal-visibility-dev/ko8wbOCESR6-Kwf5-vcPqw] already exists","index_uuid":"ko8wbOCESR6-Kwf5-vcPqw","index":"temporal-visibility-dev"},"status":400}%

( removed parts of logs for brevity)

when i describe the pod, i do see environment variables getting set properly

Environment:
      POD_IP:                               (v1:status.podIP)
      ENABLE_ES:                           true
      ES_SEEDS:                            <removed for security purprose>
      ES_PORT:                             443
      ES_SCHEME:                           https
      ES_VERSION:                          v7
      ES_VIS_INDEX:
      ES_USER:
      ES_PWD:
      SERVICES:                            frontend
      TEMPORAL_STORE_PASSWORD:             <set to the key 'password' in secret 'temporal-creds'>  Optional: false
      TEMPORAL_VISIBILITY_STORE_PASSWORD:  <set to the key 'password' in secret 'temporal-creds'>  Optional: false

Chitresh_Deshpande · March 15, 2021, 2:50pm

Actually adding this to config worked
visibilityIndex: "temporal-visibility"

Topic		Replies	Views
Elastic search version compatibility Community Support	2	999	October 12, 2020
Temporal 1.9.2 not working with Elasticsearch 7.7 - Error 400 All Shards Failed Community Support elasticsearch	3	3098	July 21, 2021
Seeing exception when using Elasticsearch + MySQL Community Support elasticsearch , mysql , helm	4	1118	July 7, 2020
Temporal PODS not coming up with "index_not_found_exception" error Server Deployment elasticsearch , mysql , helm	3	1063	July 29, 2024
Temporal example sample will not run Community Support go-sdk	3	1253	April 12, 2022

Frontend crashlooping after failing to create index in ES

Related topics