Temporal UI not displaying correct status for a workflow

sdonovan · July 22, 2020, 6:05am

I’ve been running some experimental workflows using 0.27. I saturated it with 5,000 executions, each signalling to 1 of 100 queues, with the queues processing incoming requests, and completing the originating activity.

After processing, the UI shows an inconsistent result. For one workflow, sqlserver91, the list of workflows displays sqlserver91 as “Running”. However, if you go into the workflow summary, it shows “Completed”. This is inconsistent. I tried refreshing a few times and nothing happened.

Any ideas?

Regards,

Sean

samar · July 22, 2020, 5:12pm

Hey @sdonovan,
The API which drives list view on the UI is powered by Temporal visibility API which provides eventual consistent semantics. As oppose to the summary page which uses highly consistent semantics.
So they can go out of sync for a short period of time. But having this out of sync for extended period of time is definitely not expected. This usually means something on the server is back logged or potential bug on the server. In general we do not recommend running any high scale scenario on using visibility backed by cassandra or mysql as they have known scalability problems. For any high scale production deployment we generally recommend elastic search backend.

Can you provide some information about the your setup:

Which visibility store are you using? Cassandra? MySQL? Elastic Search?
Can you describe your cluster setup in general?
In case of elastic search do you see any backlog on Kafka?
Are you seeing any errors on server logs during your run? Can you share those with us?
Can you describe your scale run?

sergle · December 2, 2020, 12:41pm

Hi @samar,

I observed the same case with Cassandra:

docker exec -ti temporal tctl --ad 192.168.5.7:7233 wf list --open
WORKFLOW TYPE | WORKFLOW ID | RUN ID | TASK QUEUE | START TIME | EXECUTION TIME
SingleSendEvent | event-sender-034416 | 6dcfe3ef-0e1d-49d1-be9e-104fcbf1be5a | EV_SENDER_TASK_QUEUE | 11:33:47 | 11:33:47
…

but:

docker exec -ti temporal tctl --ad 192.168.5.7:7233 wf desc -w event-sender-034416 -r 6dcfe3ef-0e1d-49d1-be9e-104fcbf1be5a
{
“executionConfig”: {
“taskQueue”: {
“name”: “EV_SENDER_TASK_QUEUE”,
“kind”: “Normal”
},
“workflowExecutionTimeout”: “315360000s”,
“workflowRunTimeout”: “315360000s”,
“defaultWorkflowTaskTimeout”: “60s”
},
“workflowExecutionInfo”: {
“execution”: {
“workflowId”: “event-sender-034416”,
“runId”: “6dcfe3ef-0e1d-49d1-be9e-104fcbf1be5a”
},
“type”: {
“name”: “SingleSendEvent”
},
“startTime”: “2020-12-02T11:33:47.696Z”,
“closeTime”: “2020-12-02T11:33:47.826450212Z”,
“status”: “Completed”,
“historyLength”: “6”,
“memo”: {

},
"autoResetPoints": {
  "points": [
    {
      "binaryChecksum": "2411b80d1acfa941f4014bed58321b73",
      "runId": "ff871023-8525-4d78-a13b-33866f898534",
      "firstWorkflowTaskCompletedId": "5",
      "createTime": "2020-12-02T11:33:47.693117959Z",
      "expireTime": "2020-12-03T11:33:47.693207392Z",
      "resettable": true
    }
  ]
}

}
}

In logs I found only one record:
{"level":"info","ts":"2020-12-02T11:33:47.834Z","msg":"Request for closed workflow is sampled","service":"history","wf-namespace-id":"adfcb219-7bb0-4d6c-8610-063755d5106a","wf-namespace":"default","wf-type":"SingleSendEvent","wf-id":"event-sender-034416","wf-run-id":"6dcfe3ef-0e1d-49d1-be9e-104fcbf1be5a","logging-call-at":"visibilitySamplingClient.go:133"}

Ruslan · December 2, 2020, 8:04pm

@sergle tctl’s list and describe commands use the same APIs that Web uses for Listing Workflows and Describing single workflow.
With this in mind Samar’s post can be applied to the tctl commands as well. I.e visibility backed by Cassandra is not recommended on high scale and is better to use elastic search in this case

sergle · December 3, 2020, 10:59am

Hi, @Ruslan
Can Elastic be used as primary storage for visibility without Cassandra?
Right now in config_template.yaml from docker image I see that Elastic used as an additional storage:

persistence:
    numHistoryShards: {{ default .Env.NUM_HISTORY_SHARDS "4" }}
    defaultStore: default
    visibilityStore: visibility
    {{- $es := default .Env.ENABLE_ES "false" | lower -}}
    {{- if eq $es "true" }}
    advancedVisibilityStore: es-visibility
    {{- end }}

maxim · December 3, 2020, 8:23pm

When ES is used it takes over visibility queries. So it is a primary and the ony storage for visibility queries. So Cassandra is not involved in them. Cassandra (or MySQL or PostreSQL) is still needed for all other persistence needs of the cluster.

Ruslan · December 13, 2020, 2:00am

This topic was automatically closed after 3 days. New replies are no longer allowed.

Topic		Replies	Views
Unusual Temporal UI behavior Community Support	7	721	April 3, 2025
Missing data on workflow running on Temporal Web UI Community Support web-ui	3	857	July 12, 2022
Workflow status different in 'list' vs 'describe' Community Support	5	725	December 25, 2020
Worklow search has incorrect status Community Support	1	291	November 27, 2023
The status in executions_visibility table updated failed Community Support go-sdk , mysql , web-ui	9	984	November 7, 2024

Temporal UI not displaying correct status for a workflow

Related topics