My temporal cluster have two server nodes.I guess this promblem only appear in multi server nodes scene.
I use samples-go/cron to test the two server nodes temporal cluster.I find some workflow executions didn’t update their status in executions_visibility table after they completed.In the webUI,they still diplayed running.
Actual they were completed.
In the mysql database,I found the status was 1 and the completed workflow executions must be 2.
Is there any solution?
Actually it was completed.
In the mysql database, the status in table executions_visibility was 1.
Whats the Temporal server version you are using? Can you give more info on your multi-cluster setup?
I assume you configure standard visibility only. If you have server metrics enabled and are scraping them could you share your visibility latencies graph?
histogram_quantile(0.95, sum(rate(task_latency_bucket{operation=~"VisibilityTask.*", service_name="history"}[1m])) by (operation, le))
also check the execution status for this via tctl using:
tctl wf desc -w <wfid>
The server version is 1.21.0.
Here is the multi-cluster setup, I use some variables to instead the real ip.Olny the ${nodeIP} is different between cluster nodes.
log:
stdout: false
level: info
outputFile: "/tmp/temporal-server.log"
persistence:
defaultStore: mysql-default
visibilityStore: mysql-visibility
numHistoryShards: 2048
datastores:
mysql-default:
sql:
pluginName: "mysql"
databaseName: "temporal"
connectAddr: "${mysqlAddr}
connectProtocol: "tcp"
connectAttributes:
tx_isolation: 'READ-COMMITTED'
user: "didi_2BYe"
password: "yHz8aY6Hq"
maxConns: 20
maxIdleConns: 20
maxConnLifetime: "1h"
mysql-visibility:
sql:
pluginName: "mysql"
databaseName: "temporal_visibility"
connectAddr: "${mysqlAddr}"
connectProtocol: "tcp"
connectAttributes:
tx_isolation: 'READ-COMMITTED'
user: "didi_2BYe"
password: "yHz8aY6Hq"
maxConns: 2
maxIdleConns: 2
maxConnLifetime: "1h"
global:
membership:
maxJoinDuration: 30s
broadcastAddress: "${nodeIP}"
pprof:
port: 7936
metrics:
prometheus:
# # specify framework to use new approach for initializing metrics and/or use opentelemetry
# framework: "opentelemetry"
framework: "tally"
timerType: "histogram"
listenAddress: "127.0.0.1:8000"
services:
frontend:
rpc:
grpcPort: 7233
membershipPort: 6933
#bindOnLocalHost: true
bindOnIP: ${nodeIP}
matching:
rpc:
grpcPort: 7235
membershipPort: 6935
#bindOnLocalHost: true
bindOnIP: ${nodeIP}
history:
rpc:
grpcPort: 7234
membershipPort: 6934
#bindOnLocalHost: true
bindOnIP: ${nodeIP}
worker:
rpc:
grpcPort: 7239
membershipPort: 6939
#bindOnLocalHost: true
bindOnIP: ${nodeIP}
clusterMetadata:
enableGlobalNamespace: false
failoverVersionIncrement: 10
masterClusterName: "active"
currentClusterName: "active"
clusterInformation:
active:
enabled: true
initialFailoverVersion: 1
rpcName: "frontend"
rpcAddress: "${nodeIP}:7233"
publicClient:
hostPort: "${nodeIP}:7233"
dcRedirectionPolicy:
policy: "noop"
toDC: ""
archival:
history:
state: "disabled"
enableRead: false
visibility:
state: "disabled"
enableRead: false
namespaceDefaults:
archival:
history:
state: "disabled"
#URI: "file:///tmp/temporal_archival/development"
visibility:
state: "disabled"
#URI: "file:///tmp/temporal_vis_archival/development"
dynamicConfigClient:
filepath: "config/dynamicconfig/development-sql.yaml"
pollInterval: "10s"
I haven’t collected any metrics yet. I will do it later and show you.
tihomir:
histogram_quantile(0.95, sum(rate(task_latency_bucket{operation=~"VisibilityTask.*", service_name="history"}[1m])) by (operation, le))
Here is the server metrics you want to see.
The execution status of the run instance is completed.
Is there any progress for this topic? I have the same problem.
I have the same issue as well.
Temporal server version: 1.23.0
Visibility Storage DB: AWS Aurora Postgres