Failed to update current execution. Error

temporal version: 1.7.0
es version 7.10.1
logs:

I have two nodes running temporal server as a cluster, bellow error only emmited on one node, another one didn’t have this kind of error.

After change log level from "error " to “info” as bellow, sill no valuable information inside logs, any clue ?

    es-visibility:
      elasticsearch:
        version: "v7"
        logLevel: "info"

temporal-server[12542]: {"level":"error","ts":"2021-03-16T13:19:14.728+0800","msg":"Operation failed with internal error.","service":"history","err
or":"UpdateWorkflowExecution: failed to update current execution. Error: assertRunIDAndUpdateCurrentExecution failed. Current RunId was ea0c196d-64d3-4744-8acf-9d70631ba42c, expected 5c714c56-0ee7-4806-a5bd
-eb8d1a6059f6","metric-scope":5,"shard-id":264,"logging-call-at":"persistenceMetricClients.go:676","stacktrace":"go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/logge
rimpl/logger.go:138\ngo.temporal.io/server/common/persistence.(*workflowExecutionPersistenceClient).updateErrorMetric\n\t/temporal/common/persistence/persistenceMetricClients.go:676\ngo.temporal.io/server/c
ommon/persistence.(*workflowExecutionPersistenceClient).UpdateWorkflowExecution\n\t/temporal/common/persistence/persistenceMetricClients.go:279\ngo.temporal.io/server/service/history/shard.(*ContextImpl).Up
dateWorkflowExecution\n\t/temporal/service/history/shard/context_impl.go:532\ngo.temporal.io/server/service/history.(*workflowExecutionContextImpl).updateWorkflowExecutionWithRetry.func1\n\t/temporal/servic
e/history/workflowExecutionContext.go:1043\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*workflowExecutionContextImpl).updateW
orkflowExecutionWithRetry\n\t/temporal/service/history/workflowExecutionContext.go:1047\ngo.temporal.io/server/service/history.(*workflowExecutionContextImpl).updateWorkflowExecutionWithNew\n\t/temporal/ser
vice/history/workflowExecutionContext.go:754\ngo.temporal.io/server/service/history.(*workflowExecutionContextImpl).updateWorkflowExecutionAsActive\n\t/temporal/service/history/workflowExecutionContext.go:6
05\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).updateWorkflowExecution\n\t/temporal/service/history/timerQueueActiveTaskExecutor.go:599\ngo.temporal.io/server/service/history.(*ti
merQueueActiveTaskExecutor).executeWorkflowBackoffTimerTask\n\t/temporal/service/history/timerQueueActiveTaskExecutor.go:370\ngo.temporal.io/server/service/history.(*timerQueueActiveTaskExecutor).execute\n\
t/temporal/service/history/timerQueueActiveTaskExecutor.go:104\ngo.temporal.io/server/service/history.(*timerQueueActiveProcessorImpl).process\n\t/temporal/service/history/timerQueueActiveProcessor.go:303\n
go.temporal.io/server/service/history.(*taskProcessor).processTaskOnce\n\t/temporal/service/history/taskProcessor.go:258\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/t
emporal/service/history/taskProcessor.go:211\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\
t/temporal/service/history/taskProcessor.go:238\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:161"}

temporal-server[12542]: {"level":"info","ts":"2021-03-16T13:19:14.731+0800","msg":"Range updated for shardID","service":"history","shard-id":264,"a
ddress":"192.168.44.10:7234","shard-item":"0xc001baef00","shard-range-id":26454,"previous-shard-range-id":26453,"number":27737980929,"next-number":27739029504,"logging-call-at":"context_impl.go:863"}


which database are you using?
mysql 5.7 or postgresql 9.6 or cassandra 3.11?

what configuration? single cluster setup?

the error message means the contract that there can be at most one running workflow per namespace & workflow ID is violated.

mysql 5.7
single cluster with 2 nodes, configuration as bellow, how can I fix this?
the data can be deleted, as this is a POC cluster for temporal.

how about I clean up some table inside mysql ?

tctl adm cl  d
{
  "supportedClients": {
    "temporal-cli": "\u003c2.0.0",
    "temporal-go": "\u003c2.0.0",
    "temporal-java": "\u003c2.0.0",
    "temporal-server": "\u003c2.0.0"
  },
  "serverVersion": "1.7.0",
  "membershipInfo": {
    "currentHost": {
      "identity": "192.168.44.10:7233"
    },
    "reachableMembers": [
      "192.168.44.10:6935",
      "192.168.44.10:6939",
      "192.168.127.18:6933",
      "192.168.44.10:6934",
      "192.168.127.18:6935",
      "192.168.127.18:6934",
      "192.168.127.18:6939",
      "192.168.44.10:6933"
    ],
    "rings": [
      {
        "role": "frontend",
        "memberCount": 2,
        "members": [
          {
            "identity": "192.168.127.18:7233"
          },
          {
            "identity": "192.168.44.10:7233"
          }
        ]
      },
      {
        "role": "history",
        "memberCount": 2,
        "members": [
          {
            "identity": "192.168.44.10:7234"
          },
          {
            "identity": "192.168.127.18:7234"
          }
        ]
      },
      {
        "role": "matching",
        "memberCount": 2,
        "members": [
          {
            "identity": "192.168.44.10:7235"
          },
          {
            "identity": "192.168.127.18:7235"
          }
        ]
      },
      {
        "role": "worker",
        "memberCount": 2,
        "members": [
          {
            "identity": "192.168.127.18:7239"
          },
          {
            "identity": "192.168.44.10:7239"
          }
        ]
      }
    ]
  }
}

about the recovery, try

tctl admin workflow delete -h

about your cluster setup, try to run one service per pod / host, e.g.
host 1 & 2 runs temporal frontend
host 3 & 4 runs temporal history
host 5 & 6 runs temporal matching
host 7 & 8 runs temporal worker

about the error, this error is unexpected, are there any manual operation done to the database? is the database mysql 5.7 or some database which is compatible with mysql 5.7?

tctl admin workflow delete --db_engine mysql -db_address \
  host --db_port 3306 --username temporal --password  pass  \
-r 1dfeae76-d679-4b03-bc18-bc06e1354a0c

delete runid didn’t work as need workflow id which but I can’t find it.
Is there any way to destroy everything inside db? then start from beginning.

Error: Option workflow_id is required

I’m using systemd to start temporal: extract binaries from temporal docker image, then setup systemd service on nodes. as I don’t have k8s cluster in hand, and will not have in near further, so this is the only way if we want to use temporal in production: saltstack + systemd run on multi node.

might be this making the db data corrupt during cluster setup.


mysql> delete from task_queues;                                                                                                                                                                               
Query OK, 223 rows affected (0.00 sec)

mysql> select * from queue;                                                                                                                                                                                   Empty set (0.00 sec)

mysql> delete from executions;                                                                                                                                                                                
Query OK, 0 rows affected (0.00 sec)

mysql> delete from current_executions;                                                                                                                                                                        Query OK, 0 rows affected (0.00 sec)

delete everything from above , world turn to quiet now.

./temporal-sql-tool -u temporal --pw temporal drop --db temporal -f
./temporal-sql-tool -u temporal --pw temporal drop --db temporal_visibility -f

ref: https://github.com/temporalio/temporal/blob/v1.7.0/Makefile#L324

1 Like

I got a similar issue, but for temporal-sys-tq-scanner workflow. The error message is assertRunIDAndUpdateCurrentExecution failed. current run ID: <run-id-a>, request run ID: <run-id-b>

I see both the run-id in the executions table in the temporal db, and both have the same next_event_id and db_record_version.

From your AI developer assistant, I got this response

If you're seeing an error like "assertRunIDAndUpdateCurrentExecution failed. current run ID: <run-id>, request run ID: <different-run-id>", it could be due to a race condition or some inconsistencies in the Temporal system.

how do i fix this?

Btw, the AI Developer’s assistant is great!