[Advanced visibility upgrade] VisibilityUpsertExecution: Unable to upsert workflow execution: Error 3143: Invalid JSON path expression

Hi everyone,

I’m currently upgrading our self-hosted Temporal service from 1.23.1 to 1.24.2, and trying to enable advanced visibility.
I followed the 1.20 and 1.21 release notes on how to upgrade to advanced visibility.
But after the upgrade I get the following error from the history pods when closing the completed workflows:

{
  "level": "error",
  "ts": "2024-12-17T08:52:15.533Z",
  "msg": "Fail to process task",
  "shard-id": 357,
  "address": "10.0.44.17:7234",
  "component": "visibility-queue-processor",
  "wf-namespace-id": "f5d27cca-acb8-4307-8a7f-f02533bf6a2b",
  "wf-id": "temporal-sys-scheduler:9ec56f60-2695-4386-9897-ffa717674c70",
  "wf-run-id": "e08066f0-1780-432a-bdc7-71e8d2592578",
  "queue-task-key": {
    "FireTime": "1970-01-01T00:00:00Z",
    "TaskID": 73400404
  },
  "queue-task-type": "VisibilityUpsertExecution",
  "queue-task": {
    "NamespaceID": "f5d27cca-acb8-4307-8a7f-f02533bf6a2b",
    "WorkflowID": "temporal-sys-scheduler:9ec56f60-2695-4386-9897-ffa717674c70",
    "RunID": "e08066f0-1780-432a-bdc7-71e8d2592578",
    "VisibilityTimestamp": "2024-12-17T08:50:50.176744019Z",
    "TaskID": 73400404,
    "Version": 0
  },
  "wf-history-event-id": 0,
  "error": "unable to upsert workflow execution: Error 3143: Invalid JSON path expression. The error is around character position 1.",
  "error-type": "mysql.MySQLError",
  "unexpected-error-attempts": 26,
  "lifecycle": "ProcessingFailed",
  "logging-call-at": "lazy_logger.go:68",
  "stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/runner/work/docker-builds/docker-builds/temporal/service/history/queues/executable.go:543\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/tasks/fifo_scheduler.go:228\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/backoff/retry.go:117\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/backoff/retry.go:143\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/backoff/retry.go:118\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/tasks/fifo_scheduler.go:237\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/tasks/fifo_scheduler.go:215"
}

We use MySQL 8.0.35 for persistence.

Any idea on what I could do to fix/debug that ?

After a closer investigation, we finally understood the issue.

When we first set up Temporal, we unfortunately started with version 1.21.1, which includes a bug in the v1.3 migration of the visibility database, that was later fixed in Temporal 1.21.2.

As a result the expression of the execution_visibility.BuildIds virtual generated column was incorrect, but the issue would only translate into VisibilityUpsertExecution errors once advanced visibility enabled. This explains why we didn’t catch it before our 1.24.2 upgrade.

To allow the history pods to correctly mark the workflows as completed you can manually modify the expression of the column:

ALTER TABLE temporal_visibility.executions_visibility MODIFY COLUMN BuildIds JSON GENERATED ALWAYS AS (search_attributes->"$.BuildIds")

Note that you’ll also have to deal manually with all the workflows that were not correctly marked as completed prior to the fix of the column expression.