Web client showing Failed workflow as Running

This is somewhat hard to reproduce. It appears to happened after I ran a program to submit 1000 workflows as fast as I can. Each execution spends up to 20 seconds hitting a local server that intentionally returns an error ~50% of the time.

The workflow executions appear to have worked fine. There, however, does appear to be an error in the UI. Here, the listing page shows one workflow (starting with “253d…”) in the “Running” state.

However, the summary page for the specific run shows it as Failed

image

Hope this is useful. If it makes sense to create a github issue for it, let me know.

This is known issue. The visibility records can be sampled when a service is overloaded. In the latest releases we disabled (or planning to disable) sampling completely fo non Cassandra visibility stores. For Cassandra we recommend keeping the sampling as its secondary indexes don’t work for large data sets. We do recommend using ElasticSearch for visibility when expected load on the system is high.

1 Like