Temporal UI not displaying correct status for a workflow

I’ve been running some experimental workflows using 0.27. I saturated it with 5,000 executions, each signalling to 1 of 100 queues, with the queues processing incoming requests, and completing the originating activity.

After processing, the UI shows an inconsistent result. For one workflow, sqlserver91, the list of workflows displays sqlserver91 as “Running”. However, if you go into the workflow summary, it shows “Completed”. This is inconsistent. I tried refreshing a few times and nothing happened.

Any ideas?

Regards,

Sean

1 Like

Hey @sdonovan,
The API which drives list view on the UI is powered by Temporal visibility API which provides eventual consistent semantics. As oppose to the summary page which uses highly consistent semantics.
So they can go out of sync for a short period of time. But having this out of sync for extended period of time is definitely not expected. This usually means something on the server is back logged or potential bug on the server. In general we do not recommend running any high scale scenario on using visibility backed by cassandra or mysql as they have known scalability problems. For any high scale production deployment we generally recommend elastic search backend.

Can you provide some information about the your setup:

  • Which visibility store are you using? Cassandra? MySQL? Elastic Search?
  • Can you describe your cluster setup in general?
  • In case of elastic search do you see any backlog on Kafka?
  • Are you seeing any errors on server logs during your run? Can you share those with us?
  • Can you describe your scale run?

Hi @samar,

I observed the same case with Cassandra:

docker exec -ti temporal tctl --ad 192.168.5.7:7233 wf list --open
WORKFLOW TYPE | WORKFLOW ID | RUN ID | TASK QUEUE | START TIME | EXECUTION TIME
SingleSendEvent | event-sender-034416 | 6dcfe3ef-0e1d-49d1-be9e-104fcbf1be5a | EV_SENDER_TASK_QUEUE | 11:33:47 | 11:33:47

but:

docker exec -ti temporal tctl --ad 192.168.5.7:7233 wf desc -w event-sender-034416 -r 6dcfe3ef-0e1d-49d1-be9e-104fcbf1be5a
{
“executionConfig”: {
“taskQueue”: {
“name”: “EV_SENDER_TASK_QUEUE”,
“kind”: “Normal”
},
“workflowExecutionTimeout”: “315360000s”,
“workflowRunTimeout”: “315360000s”,
“defaultWorkflowTaskTimeout”: “60s”
},
“workflowExecutionInfo”: {
“execution”: {
“workflowId”: “event-sender-034416”,
“runId”: “6dcfe3ef-0e1d-49d1-be9e-104fcbf1be5a”
},
“type”: {
“name”: “SingleSendEvent”
},
“startTime”: “2020-12-02T11:33:47.696Z”,
“closeTime”: “2020-12-02T11:33:47.826450212Z”,
“status”: “Completed”,
“historyLength”: “6”,
“memo”: {

},
"autoResetPoints": {
  "points": [
    {
      "binaryChecksum": "2411b80d1acfa941f4014bed58321b73",
      "runId": "ff871023-8525-4d78-a13b-33866f898534",
      "firstWorkflowTaskCompletedId": "5",
      "createTime": "2020-12-02T11:33:47.693117959Z",
      "expireTime": "2020-12-03T11:33:47.693207392Z",
      "resettable": true
    }
  ]
}

}
}

In logs I found only one record:
{"level":"info","ts":"2020-12-02T11:33:47.834Z","msg":"Request for closed workflow is sampled","service":"history","wf-namespace-id":"adfcb219-7bb0-4d6c-8610-063755d5106a","wf-namespace":"default","wf-type":"SingleSendEvent","wf-id":"event-sender-034416","wf-run-id":"6dcfe3ef-0e1d-49d1-be9e-104fcbf1be5a","logging-call-at":"visibilitySamplingClient.go:133"}

@sergle tctl’s list and describe commands use the same APIs that Web uses for Listing Workflows and Describing single workflow.
With this in mind Samar’s post can be applied to the tctl commands as well. I.e visibility backed by Cassandra is not recommended on high scale and is better to use elastic search in this case

Hi, @Ruslan
Can Elastic be used as primary storage for visibility without Cassandra?
Right now in config_template.yaml from docker image I see that Elastic used as an additional storage:

persistence:
    numHistoryShards: {{ default .Env.NUM_HISTORY_SHARDS "4" }}
    defaultStore: default
    visibilityStore: visibility
    {{- $es := default .Env.ENABLE_ES "false" | lower -}}
    {{- if eq $es "true" }}
    advancedVisibilityStore: es-visibility
    {{- end }}

When ES is used it takes over visibility queries. So it is a primary and the ony storage for visibility queries. So Cassandra is not involved in them. Cassandra (or MySQL or PostreSQL) is still needed for all other persistence needs of the cluster.