Temporal workflows not persisting through restarts

I’m running the docker-compose version of temporal on EC2 with mysql for the persistence layer. Elasticsearch is also installed. I noticed workflows are not persisting through restarts of the temporal container (UI shows empty workflow list after docker-compose down then up). I made sure I am in the right namespace when seeing the workflows are gone.

I should mention I still see the workflow id’s in the executions table, so it looks like the workflows are persisted, they just do not appear in the UI after restart

Is there a setting to configure this? I also do not have event history archival turned on, but assumed that is unrelated. Here is the docker compose yaml for the temporal container:

temporal:
container_name: temporal
depends_on:
- elasticsearch
environment:
- DB=mysql
- MYSQL_USER=xxxx
- MYSQL_PWD=xxxx
- MYSQL_SEEDS=xxxx
- DYNAMIC_CONFIG_FILE_PATH=config/dynamicconfig/development-sql.yaml
- ENABLE_ES=true
- ES_SEEDS=elasticsearch
- ES_VERSION=v7
#- ES_SCHEME=https
#- ES_USER=elastic
#- ES_PWD=elastic
- SKIP_DEFAULT_NAMESPACE_CREATION=true
- TEMPORAL_TLS_SERVER_CA_CERT=${TEMPORAL_TLS_CERTS_DIR}/mtls_ca.pem
- TEMPORAL_TLS_SERVER_CERT=${TEMPORAL_TLS_CERTS_DIR}/mtls.pem
- TEMPORAL_TLS_SERVER_KEY=${TEMPORAL_TLS_CERTS_DIR}/mtls.key
- TEMPORAL_TLS_REQUIRE_CLIENT_AUTH=true
- TEMPORAL_TLS_FRONTEND_CERT=${TEMPORAL_TLS_CERTS_DIR}/mtls.pem
- TEMPORAL_TLS_FRONTEND_KEY=${TEMPORAL_TLS_CERTS_DIR}/mtls.key
- TEMPORAL_TLS_CLIENT1_CA_CERT=${TEMPORAL_TLS_CERTS_DIR}/mtls_ca.pem
- TEMPORAL_TLS_CLIENT2_CA_CERT=${TEMPORAL_TLS_CERTS_DIR}/xxxx_ca.pem
- TEMPORAL_TLS_INTERNODE_SERVER_NAME=xxxx
- TEMPORAL_TLS_FRONTEND_SERVER_NAME=xxxx
- TEMPORAL_TLS_FRONTEND_DISABLE_HOST_VERIFICATION=true
- TEMPORAL_TLS_INTERNODE_DISABLE_HOST_VERIFICATION=true
- TEMPORAL_CLI_ADDRESS=temporal:7233
- TEMPORAL_CLI_TLS_CA=${TEMPORAL_TLS_CERTS_DIR}/mtls_ca.pem
- TEMPORAL_CLI_TLS_CERT=${TEMPORAL_TLS_CERTS_DIR}/mtls.pem
- TEMPORAL_CLI_TLS_KEY=${TEMPORAL_TLS_CERTS_DIR}/mtls.key
- TEMPORAL_CLI_TLS_ENABLE_HOST_VERIFICATION=false
- TEMPORAL_CLI_TLS_SERVER_NAME=xxxx
image: temporalio/auto-setup:${TEMPORAL_VERSION}
networks:
- temporal-network
ports:
- 7233:7233
volumes:
- ./dynamicconfig:/etc/temporal/config/dynamicconfig
- ${TEMPORAL_LOCAL_CERT_DIR}:${TEMPORAL_TLS_CERTS_DIR}
labels:
kompose.volume.type: configMap

I should mention I still see the workflow id’s in the executions table,

Check if you can describe one of these workflow:

tctl wf desc -w <wokflow_id>

UI shows empty workflow list after docker-compose down then up

Can you bash into your temporal container and run the auto-setup.sh script, check for any errors.
It should set up your ES indexes here. My guess is that for some reason setting up indexes is failing on restart.

Oh thanks for this, it’s helpful.

tctl wf desc -w  'metric-run-agent|22671'
Error: Describe workflow execution failed
Error Details: rpc error: code = NotFound desc = sql: no rows in result set
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)

listall also doesn’t print anything either (there are some completed workflows in the UI right now that have completed since last temporal restart).

tctl workflow listall
  WORKFLOW TYPE | WORKFLOW ID | RUN ID | TASK QUEUE | START TIME | EXECUTION TIME | END TIME

tctl is working though (cluster health)

tctl cluster health
temporal.api.workflowservice.v1.WorkflowService: SERVING

Executions are in the DB

mysql> select workflow_id,namespace_id,shard_id,run_id,last_write_version from executions where workflow_id = 'metric-run-agent|22671';
+------------------------+------------------------------------+----------+------------------------------------+--------------------+
| workflow_id            | namespace_id                       | shard_id | run_id                             | last_write_version |
+------------------------+------------------------------------+----------+------------------------------------+--------------------+
| metric-run-agent|22671 | 0x72DFEC56DE974B35B8D99A8EB98AA20C |        4 | 0x036B40EA0B2C46B693CDE79801D0D522 |                  0 |
| metric-run-agent|22671 | 0x72DFEC56DE974B35B8D99A8EB98AA20C |        4 | 0x500A6D7C1CE04FE1A53DA337E0771678 |                  0 |
| metric-run-agent|22671 | 0x72DFEC56DE974B35B8D99A8EB98AA20C |        4 | 0x5A4C429FA4534DABB4CDA580C03F2441 |                  0 |
| metric-run-agent|22671 | 0x72DFEC56DE974B35B8D99A8EB98AA20C |        4 | 0x832A58C6B76B44259FB9195172FA3090 |                  0 |
| metric-run-agent|22671 | 0x72DFEC56DE974B35B8D99A8EB98AA20C |        4 | 0xB63ABA86DBCD4677B8C7FC9A1D48F18C |                  0 |
| metric-run-agent|22671 | 0x72DFEC56DE974B35B8D99A8EB98AA20C |        4 | 0xE6D9A547FC3E4EEDA54898DE60AA32CB |                  0 |
+------------------------+------------------------------------+----------+------------------------------------+--------------------+
6 rows in set (0.05 sec)

Running auto-setup.sh from on the temporal container itself (docker exec -it temporal /bin/bash), this seems like the only problem

+ curl --user : -X PUT http://elasticsearch:9200/temporal_visibility_v1_dev --write-out '\n'
{"error":{"root_cause":[{"type":"resource_already_exists_exception","reason":"index [temporal_visibility_v1_dev/sVnU3px8Rki00pxAoku2pw] already exists","index_uuid":"sVnU3px8Rki00pxAoku2pw","index":"temporal_visibility_v1_dev"}],"type":"resource_already_exists_exception","reason":"index [temporal_visibility_v1_dev/sVnU3px8Rki00pxAoku2pw] already exists","index_uuid":"sVnU3px8Rki00pxAoku2pw","index":"temporal_visibility_v1_dev"},"status":400}
+ setup_server
+ echo 'Temporal CLI address: temporal:7233.'
Temporal CLI address: temporal:7233.
bash-5.1$ + tctl cluster health
+ grep -q SERVING
+ echo 'Temporal server started.'
Temporal server started.

I tried to delete the ES index (curl -X DELETE http://elasticsearch:9200/temporal_visibility_v1_dev), then re-run auto_setup.sh and the 400 definitely goes away, but overall behavior did not change.

If it helps at all, full output from auto_setup.sh is below

bash-5.1$ ./auto-setup.sh

  • : mysql
  • : false
  • : false
  • : temporal
  • : temporal_visibility
  • : ‘’
  • : 9042
  • : ‘’
  • : ‘’
  • : ‘’
  • : ‘’
  • : ‘’
  • : ‘’
  • : 1
  • : temporal
  • : temporal_visibility
  • : 3306
  • : xxxxmysqlhostname
  • : xxxxmysqluser
  • : xxxxmysqlpassword
  • : false
  • : ‘’
  • : ‘’
  • : ‘’
  • : true
  • : http
  • : elasticsearch
  • : 9200
  • : ‘’
  • : ‘’
  • : v7
  • : temporal_visibility_v1_dev
  • : 0
  • : temporal:7233
  • : true
  • : default
  • : 1
  • : false
  • [[ false != true ]]
  • validate_db_env
  • case ${DB} in
  • [[ -z xxxxmysqlhostname ]]
  • wait_for_db
  • case ${DB} in
  • wait_for_mysql
  • nc -z xxxxmysqlhostname 3306
  • echo ‘MySQL started.’
    MySQL started.
  • setup_schema
  • case ${DB} in
  • echo ‘Setup MySQL schema.’
    Setup MySQL schema.
  • setup_mysql_schema
  • [[ false == true ]]
  • MYSQL_CONNECT_ATTR=()
  • SCHEMA_DIR=/etc/temporal/schema/mysql/v57/temporal/versioned
  • [[ false != true ]]
  • temporal-sql-tool --ep xxxxmysqlhostname -u xxxxmysqluser -p 3306 --db temporal create
  • temporal-sql-tool --ep xxxxmysqlhostname -u xxxxmysqluser -p 3306 --db temporal setup-schema -v 0.0
    2023-03-06T18:16:32.721Z INFO Starting schema setup {“config”: {“SchemaFilePath”:“”,“InitialVersion”:“0.0”,“Overwrite”:false,“DisableVersioning”:false}, “logging-call-at”: “setuptask.go:57”}
    2023-03-06T18:16:32.722Z DEBUG Setting up version tables {“logging-call-at”: “setuptask.go:67”}
    2023-03-06T18:16:32.787Z DEBUG Current database schema version 1.9 is greater than initial schema version 0.0. Skip version upgrade {“logging-call-at”: “setuptask.go:116”}
    2023-03-06T18:16:32.787Z INFO Schema setup complete {“logging-call-at”: “setuptask.go:131”}
  • temporal-sql-tool --ep xxxxmysqlhostname -u xxxxmysqluser -p 3306 --db temporal update-schema -d /etc/temporal/schema/mysql/v57/temporal/versioned
    2023-03-06T18:16:32.859Z INFO UpdateSchemeTask started {“config”: {“DBName”:“”,“TargetVersion”:“”,“SchemaDir”:“/etc/temporal/schema/mysql/v57/temporal/versioned”,“IsDryRun”:false}, “logging-call-at”: “updatetask.go:97”}
    2023-03-06T18:16:32.874Z DEBUG Schema Dirs: {“logging-call-at”: “updatetask.go:186”}
    2023-03-06T18:16:32.874Z DEBUG found zero updates from current version 1.9 {“logging-call-at”: “updatetask.go:127”}
    2023-03-06T18:16:32.874Z INFO UpdateSchemeTask done {“logging-call-at”: “updatetask.go:120”}
  • VISIBILITY_SCHEMA_DIR=/etc/temporal/schema/mysql/v57/visibility/versioned
  • [[ false != true ]]
  • temporal-sql-tool --ep xxxxmysqlhostname -u xxxxmysqluser -p 3306 --db temporal_visibility create
  • temporal-sql-tool --ep xxxxmysqlhostname -u xxxxmysqluser -p 3306 --db temporal_visibility setup-schema -v 0.0
    2023-03-06T18:16:32.982Z INFO Starting schema setup {“config”: {“SchemaFilePath”:“”,“InitialVersion”:“0.0”,“Overwrite”:false,“DisableVersioning”:false}, “logging-call-at”: “setuptask.go:57”}
    2023-03-06T18:16:32.982Z DEBUG Setting up version tables {“logging-call-at”: “setuptask.go:67”}
    2023-03-06T18:16:33.022Z DEBUG Current database schema version 1.1 is greater than initial schema version 0.0. Skip version upgrade {“logging-call-at”: “setuptask.go:116”}
    2023-03-06T18:16:33.023Z INFO Schema setup complete {“logging-call-at”: “setuptask.go:131”}
  • temporal-sql-tool --ep xxxxmysqlhostname -u xxxxmysqluser -p 3306 --db temporal_visibility update-schema -d /etc/temporal/schema/mysql/v57/visibility/versioned
    2023-03-06T18:16:33.057Z INFO UpdateSchemeTask started {“config”: {“DBName”:“”,“TargetVersion”:“”,“SchemaDir”:“/etc/temporal/schema/mysql/v57/visibility/versioned”,“IsDryRun”:false}, “logging-call-at”: “updatetask.go:97”}
    2023-03-06T18:16:33.067Z DEBUG Schema Dirs: {“logging-call-at”: “updatetask.go:186”}
    2023-03-06T18:16:33.067Z DEBUG found zero updates from current version 1.1 {“logging-call-at”: “updatetask.go:127”}
    2023-03-06T18:16:33.067Z INFO UpdateSchemeTask done {“logging-call-at”: “updatetask.go:120”}
  • [[ true == true ]]
  • validate_es_env
  • [[ true == true ]]
  • [[ -z elasticsearch ]]
  • wait_for_es
  • SECONDS=0
  • ES_SERVER=http://elasticsearch:9200
  • curl --silent --fail --user : http://elasticsearch:9200
  • echo ‘Elasticsearch started.’
    Elasticsearch started.
  • setup_es_index
  • ES_SERVER=http://elasticsearch:9200
  • SETTINGS_URL=http://elasticsearch:9200/_cluster/settings
  • SETTINGS_FILE=/etc/temporal/schema/elasticsearch/visibility/cluster_settings_v7.json
  • TEMPLATE_URL=http://elasticsearch:9200/_template/temporal_visibility_v1_template
  • SCHEMA_FILE=/etc/temporal/schema/elasticsearch/visibility/index_template_v7.json
  • INDEX_URL=http://elasticsearch:9200/temporal_visibility_v1_dev
  • curl --fail --user : -X PUT http://elasticsearch:9200/_cluster/settings -H ‘Content-Type: application/json’ --data-binary @/etc/temporal/schema/elasticsearch/visibility/cluster_settings_v7.json --write-out ‘\n’
    {“acknowledged”:true,“persistent”:{“action”:{“auto_create_index”:“false”}},“transient”:{}}
  • curl --fail --user : -X PUT http://elasticsearch:9200/_template/temporal_visibility_v1_template -H ‘Content-Type: application/json’ --data-binary @/etc/temporal/schema/elasticsearch/visibility/index_template_v7.json --write-out ‘\n’
    {“acknowledged”:true}
  • curl --user : -X PUT http://elasticsearch:9200/temporal_visibility_v1_dev --write-out ‘\n’
    {“error”:{“root_cause”:[{“type”:“resource_already_exists_exception”,“reason”:“index [temporal_visibility_v1_dev/sVnU3px8Rki00pxAoku2pw] already exists”,“index_uuid”:“sVnU3px8Rki00pxAoku2pw”,“index”:“temporal_visibility_v1_dev”}],“type”:“resource_already_exists_exception”,“reason”:“index [temporal_visibility_v1_dev/sVnU3px8Rki00pxAoku2pw] already exists”,“index_uuid”:“sVnU3px8Rki00pxAoku2pw”,“index”:“temporal_visibility_v1_dev”},“status”:400}
  • setup_server
  • echo ‘Temporal CLI address: temporal:7233.’
    Temporal CLI address: temporal:7233.
    bash-5.1$ + tctl cluster health
  • grep -q SERVING
  • echo ‘Temporal server started.’
    Temporal server started.
  • [[ true != true ]]
  • [[ false != true ]]
  • add_custom_search_attributes
  • echo 'Adding CustomField search attributes.’
    Adding Custom
    Field search attributes.
  • tctl --auto_confirm admin cluster add-search-attributes --name CustomKeywordField --type Keyword --name CustomStringField --type Text --name CustomTextField --type Text --name CustomIntField --type Int --name CustomDatetimeField --type Datetime --name CustomDoubleField --type Double --name CustomBoolField --type Bool
    Search attributes already exist.

In case it helps someone else, I ended up fixing this, here are the notes.

Starting up quite a number of these complete temporal stacks (3?) and they all had the same repeatable behavior of workflows not persisting reboots. Other devs internally didn’t have the prob so I went over all of the configs by diffing them against the various sample docker-compose ymls.

I made the following changes and one of them fixed the prob

  • Removed ES completely since 1.20.0 has some enhanced viz on sql and it does what I need and performs well enough for my small scale
  • Removed ES flags from temporal container env vars
  • added DB_PORT=3306 env var to the temporal container (unlikely culprit)
  • removed SKIP_DEFAULT_NAMESPACE_CREATION flag
  • changed DB=mysql to DB=mysql8 (I am using mysql 8)

After changing the above, I dropped the temporal and temporal_visibility tables to let temporal re-create and after restarting the stack, I don’t have the persistence probs anymore. This worked on all 3 stacks.