Temporal-sql-tool no longer in the temporalio/server docker image, should we use temporalio/admin-tools instead?

It used to be in 1.8.1

docker run -it --rm temporalio/server:1.8.1 bash
bash-5.0# temporal-sql-tool
NAME:
   temporal-sql-tool - Command line tool for temporal sql operations

USAGE:
   temporal-sql-tool [global options] command [command options] [arguments...]

VERSION:
   0.0.1

COMMANDS:
   setup-schema, setup      setup initial version of sql schema
   update-schema, update    update sql schema to a specific version
   create-database, create  creates a database
   drop-database, drop      drops a database
   help, h                  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --endpoint value, --ep value            hostname or ip address of sql host to connect to (default: "127.0.0.1") [$SQL_HOST]
   --port value, -p value                  port of sql host to connect to (default: 3306) [$SQL_PORT]
   --user value, -u value                  user name used for authentication when connecting to sql host [$SQL_USER]
   --password value, --pw value            password used for authentication when connecting to sql host [$SQL_PASSWORD]
   --database value, --db value            name of the sql database (default: "temporal") [$SQL_DATABASE]
   --plugin value, --pl value              name of the sql plugin (default: "mysql") [$SQL_PLUGIN]
   --quiet, -q                             Don't set exit status to 1 on error
   --connect-attributes value, --ca value  sql connect attributes [$SQL_CONNECT_ATTRIBUTES]
   --tls                                   enable TLS over sql connection [$SQL_TLS]
   --tls-cert-file value                   sql tls client cert path (tls must be enabled) [$SQL_TLS_CERT_FILE]
   --tls-key-file value                    sql tls client key path (tls must be enabled) [$SQL_TLS_KEY_FILE]
   --tls-ca-file value                     sql tls client ca file (tls must be enabled) [$SQL_TLS_CA_FILE]
   --tls-enable-host-verification          sql tls verify hostname and server cert (tls must be enabled) [$SQL_TLS_ENABLE_HOST_VERIFICATION]
   --help, -h                              show help
   --version, -v                           print the version

But now in 1.10.5

docker run -it --rm temporalio/server:1.10.5 bash
bash-5.0# temporal-sql-tool
bash: temporal-sql-tool: command not found
bash-5.0#

Are we supposed to use temporalio/admin-tools ? I think that is not a good idea. How can we ensure commands executed from temporalio/admin-tools will work with temporalio/server ? We need to make sure they have the same version number. But before we do not need to care about the version number because I’m going onto temporalio/server and of course the temporal-sql-tool result will work with itself.

Thanks

when we upgrade schema, that needs to be done before the new version of temporal-server is deployed. we’d get into a chicken and egg problem with needing the new schema before the new server and not being able to run the new server without the new schema - so there’s a need for these to be decoupled.

additionally, it would be ideal if temporal server containers didn’t have any extra tools / shells / whatever for security purposes.

EDIT: also the answer to your question is - yes, use the proper version of admin tools for your schema needs.

Thank you Derek for the reply but I disagree with some of the points.
I have Temporal running in K8s cluster as deployment and I have init containers for the pod which I use the same temporalio/server docker image, but run temporal-sql-tool. So it is after the schemas got updated then the new server is deployed. I think there is a little down time right after I upgraded the schema and it’s still the old service running.
I will switch to use temporalio/admin-tools but feels a little strange, and it’s one extra image to download for that pod on that node.

Totally understand the convenience of that - you could build your own images that are a combination to maintain that workflow.

We do want to favor a zero downtime upgrade approach though - so doing an online schema upgrade is an important factor.

Developer and operator experience is also very important for sure - would having multiple options of images be interesting or confusing?

Could you explain how to exactly can we achieve zero downtime when upgrade?

Also just realized that due to some strange setup for temporalio/admin-tools , it is not possible to ask K8s to run that docker image but run the temporal-sql-tool command. We might need to overwrite the entry point or something, still looking for solutions

can you be a little more specific about the behavior you are seeing?

and what you would expect instead?

I expected the tool to be in temporalio/server but now it is only in temporalio/admin-tools
I have successfully switched to use temporalio/admin-tools. I can explain the problem I encountered when do the switch. For temporalio/server I can pass in the following args like this in my CDK code

args: [
  'sh',
  '-c',
  'temporal-sql-tool --ca tx_isolation=READ-COMMITTED -ep $MYSQL_HOST -u $MYSQL_USER -pw $MYSQL_PASSWORD create --db temporal ; true'
]

But for temporalio/admin-tools it has to be like this:

command: [
  'sh'
],
args: [
  '-c',
  'temporal-sql-tool --ca tx_isolation=READ-COMMITTED -ep $MYSQL_HOST -u $MYSQL_USER -pw $MYSQL_PASSWORD create --db temporal ; true'
]

I do not know why

The difference is because the admin tools container is currently set up to run and do nothing until something is exec’d on it…

The old docker file had an entrypoint that literally executed whatever you ran as it’s own command after doing some setup:

This is changed in the recent version of temporal server and only allows certain things to be passed as args and executed rather than arbitrary commands.

The admin tools container entrypoint runs tail -f /dev/null - the short story is that the behavior you are observing is expected.

Thank you @derek. I’m more interested in the ‘zero downtime upgrade’ you mentioned above. How exactly should we do it?

The normal process would be, while workloads are running:

  • run schema upgrade tool to upgrade to next schema version
    • running temporal services are guaranteed to be forward compatible
  • once schema upgrade is complete, rollout out new versions of services
    • you mentioned k8s - this can be done easily if you use deployments: simply bump the version of the image and apply.

Depending on the number of pods you have running for each service, you may want to tweak initialDelaySeconds for liveness check to put some delay between restarting pods. Can also set max surge and max unavailabe to tune how many replicas are rolled out at once - rollout will take longer but go smoother if a new version is added, takes a minute or so to stabilize, and then an old pod is killed (rinse,repeat).

following this process should give you a smooth upgrade without downtime.

one more note - upgrades must be to monotonically increasing minor versions. do not skip minor versions even if there isn’t a schema upgrade.

@derek , we jumped from 1.8.1 directly to 1.10.5 . Everything seems working fine. Are you saying this is wrong, that we should deploy 1.8.2, then 1.8.3, then … and go through all the versions until we reach 1.10.5 ?

It is recommended to upgrade through minor versions. so if you were on 1.8.x you would upgrade to the latest 1.9.x and then to the latest 1.10.x

My understanding is that we just don’t guarantee that it will work to skip minor versions - I’m not sure if it will always break something.

I want to say that if things are working for you all is well but @Wenquan_Xing or @samar might be better equipped to answer that question.

Got it. Thank you very much @derek. We will try to upgrade more frequently