Clarification on Upgrade Process with Schema Updates

I’m trying to upgrade from V1.7.x to the latest version 1.15.x and wanted to confirm a couple of points based on what I’ve read in the documentation and other posts.

  1. Is it necessary to go through each minor release i.e. 1.7.x → 1.8.x → 1.9.x etc or is it only necessary to go to the next release that contains a db schema change e.g 1.7.x → 1.9.x → 1.11.x?

  2. I’m not sure if the steps I’m following to upgrade the db schema are correct. Currently I’m:

What I have found when upgrading between versions with schema changes:

  1. Upgrading from 1.7.x → 1.9.x worked without any problems

  2. Upgrading from 1.9.x → 1.11.x, I was able to run make but was unable to upgrade the schema. It would just hang on the first line checking the connection.

  3. For 1.14.x make was failing with an error

$ make
Install/update check tools…
Install/update mockgen tool…
Install/update proto plugins…
Delete old binaries…
Build temporal-server with OS: darwin, ARCH: amd64…
fatal: not a git repository (or any of the parent directories): .git
make: *** [temporal-server] Error 128

I get a similar error running make for 1.15.2

Can I confirm if this is the correct process for upgrading the schema?

Hi @cg1972 check out forum posts here, here and here, think they would be helpful.

Basically Temporal guarantees minor version 1.n.x → 1.n+1.x forward / backward compatibility and you can skip patch versions during this process.

1 Like

Thanks @tihomir

I performed the updates via the admin-tool which has worked well. When performing the schema upgrade from V1.13.4 → 1.14.6 (i.e. schema version 1.5 → V1.6) I received a timeout when running the script

2022-03-22T00:42:16.975Z INFO Validating connection to cassandra cluster. {“logging-call-at”: “cqlclient.go:112”}
2022-03-22T00:42:17.573Z INFO Connection validation succeeded. {“logging-call-at”: “cqlclient.go:118”}
2022-03-22T00:42:17.573Z INFO UpdateSchemeTask started {“config”: {“DBName”:"",“TargetVersion”:"",“SchemaDir”:"./schema/cassandra/temporal/versioned",“IsDryRun”:false}, “logging-call-at”: “updatetask.go:98”}
2022-03-22T00:42:17.590Z DEBUG ---- Executing updates for version 1.6 ---- {“logging-call-at”: “updatetask.go:151”}
2022-03-22T00:42:17.590Z DEBUG ALTER TABLE executions ADD tiered_storage_task_data blob; {“logging-call-at”: “updatetask.go:153”}
2022-03-22T00:42:18.580Z DEBUG ALTER TABLE executions ADD tiered_storage_task_encoding text; {“logging-call-at”: “updatetask.go:153”}
2022-03-22T00:42:19.698Z DEBUG CREATE TABLE cluster_metadata_info (metadata_partition int,cluster_name text,data blob,data_encoding text,version bigint,PRIMARY KEY (metadata_partition, cluster_name)) WITH COMPACTION = {‘class’: ‘org.apache.cassandra.db.compaction.LeveledCompactionStrategy’}; {“logging-call-at”: “updatetask.go:153”}
2022-03-22T00:42:24.353Z ERROR Unable to update CQL schema. {“error”: “error executing statement:java.lang.RuntimeException: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.”, “logging-call-at”: “handler.go:82”}

If I try and run the script again I get an error:

2022-03-22T00:45:58.965Z ERROR Unable to update CQL schema. {“error”: “error executing statement:Invalid column name tiered_storage_task_data because it conflicts with an existing column”, “logging-call-at”: “handler.go:82”}

I checked the schema_version table and it is still on 1.5 so the script did not complete successfully. Is there are way to revert these changes so I can run the script again?

Note: I did check the schema changes here: temporal/schema/cassandra/temporal/versioned/v1.6 at release/v1.14.x · temporalio/temporal · GitHub and confirmed that all of the changes have been applied. It looks like it was just the final step to update the schema_version table that timed out. I could manually update the schema_version table but would prefer to follow a proper process if there is one.

Hi @tihomir,

Could you confirm what the process should be if the schema upgrade only partially completes?

Looking at some of the cql statements that are part of the upgrade e.g executions.cql.

ALTER TABLE executions ADD tiered_storage_task_data blob;
ALTER TABLE executions ADD tiered_storage_task_encoding text;

there are no checks in place to see if the column already exists so the cql cannot be run multiple times.

The schema upgrade failed in our production environment so I’d appreciate any advice you can provide on how to resolve this type of issue.

Hi all,

Just wondering if anybody could provide some feedback on what the suggested approach should be to allow me to continue with the upgrade in my production environment? Just to recap, when running the cql for the upgrade cassandra timed out so it did not complete. From what I can see all of the actual cql updates have been applied but the version has not been updated. Is there a way to either revert the changes and re-run the updates again or if not is it ok to just manually update the schema_version table with the new version.