ListWorkflowExecutions hangs/timeouts for closed or mixed workflow queries, while ExecutionStatus="Running" is fast and Postgres query executes in <1ms

TonyDev · May 20, 2026, 8:39am

Hi Temporal team,

We are debugging very slow ListWorkflowExecutions calls on a self-hosted Temporal cluster using Postgres advanced visibility.

At first this looked like a Postgres/index issue, but after isolating the behavior, it seems more specific: listing running workflows is fast, but listing closed/mixed workflows hangs until the client deadline is hit, even though the corresponding visibility SQL query executes in under 1ms in Postgres.

Environment

Temporal Server: 1.29.1
Docker image: temporalio/server:1.29.1
Temporal UI: 2.34.0
Visibility store: Postgres 16, advanced visibility
Persistence store: Postgres 16
History shards: now 128
Namespace: production
Retention: 365 days
Custom search attributes:
- WorkflowKind
- EntityType
- EntityId
Workload:
- Poller-style workloads
- workflows started every few seconds
- short-lived workflows, mostly completed quickly

Symptom

Unfiltered workflow list calls fail from the local Temporal host, even when bypassing UI/reverse proxy and using 127.0.0.1:7233.

temporal workflow list \
  --address 127.0.0.1:7233 \
  --namespace production \
  --limit 30

Result:

failed listing workflows: stream terminated by RST_STREAM with error code: CANCEL

Increasing CLI command timeout did not help:

temporal \
  --command-timeout 60s \
  workflow list \
  --address 127.0.0.1:7233 \
  --namespace production \
  --limit 30

Still fails with:

failed listing workflows: stream terminated by RST_STREAM with error code: CANCEL

Sometimes the error is:

context deadline exceeded

Server logs show slow visibility/list requests around the same time:

"List query exceeded threshold","duration":10.204882112,"visibility-query":"","namespace":"production"
"Slow gRPC call","duration":10.204937273,"method":"/temporal.api.workflowservice.v1.WorkflowService/ListWorkflowExecutions"

These slow list warnings repeat. Example log excerpt also shows matching/history noise around the same time, such as Workflow task already started, but those do not appear to directly explain the visibility list stall.

Important finding: running-only query is fast

This query returns immediately:

temporal workflow list \
  --address 127.0.0.1:7233 \
  --namespace production \
  --limit 30 \
  --query 'ExecutionStatus="Running"'

Example output:

Status   WorkflowId                                          Type                       StartTime
Running  inventory-7648                                      inventoryWorkflow          2 seconds ago
Running  broadcast-poller-stubhub-workflow-2026-05-20T...    broadcastPollerWorkflow    8 seconds ago
Running  order-poller-stubhub-workflow-2026-05-20T...        orderPollerWorkflow        1 minute ago

But these fail/hang:

temporal workflow list \
  --address 127.0.0.1:7233 \
  --namespace production \
  --limit 30 \
  --query 'ExecutionStatus!="Running"'

Result:

failed listing workflows: stream terminated by RST_STREAM with error code: CANCEL

And:

temporal workflow list \
  --address 127.0.0.1:7233 \
  --namespace production \
  --limit 30 \
  --query 'StartTime > "2026-05-20T00:00:00Z"'

Result:

failed listing workflows: context deadline exceeded

So the behavior seems to be:

ExecutionStatus="Running"        fast
ExecutionStatus!="Running"       hangs / cancels
StartTime > today                hangs / deadline exceeded
empty/default list               hangs / cancels

This suggests the issue is not generic visibility, not the CLI, and not the UI. It seems specific to closed/mixed workflow visibility listing.

Postgres observations during the hang

While the request was hanging, pg_stat_activity showed the visibility query had already reached Postgres and the backend was idle ClientRead, not actively executing.

Relevant query seen in pg_stat_activity:

SELECT namespace_id, run_id, workflow_type_name, workflow_id, start_time, execution_time,
       status, close_time, history_length, history_size_bytes, execution_duration,
       state_transition_count, memo, encoding, task_queue, search_attributes,
       parent_workflow_id, parent_run_id, root_workflow_id, root_run_id, _version
FROM executions_visibility
WHERE namespace_id = $1 AND TemporalNamespaceDivision is null
ORDER BY coalesce(close_time, '9999-12-31 23:59:59') DESC, start_time DESC, run_id
LIMIT $2

Also observed near the same time:

SELECT status, COUNT(*)
FROM executions_visibility
WHERE (namespace_id = $1) AND TemporalNamespaceDivision is null
GROUP BY status

The pg_stat_activity output showed these as idle/client-read rather than actively running.

Exact Postgres query is fast

We then ran the equivalent query manually with the real namespace id.

Namespace id in visibility table:

52d6e22a-fa3f-48ae-aeeb-85b2714664c4

The table uses character(64), so the actual stored value is padded.

Manual EXPLAIN ANALYZE:

EXPLAIN (ANALYZE, BUFFERS)
SELECT namespace_id, run_id, workflow_type_name, workflow_id, start_time, execution_time,
       status, close_time, history_length, memo, encoding, task_queue, search_attributes
FROM executions_visibility
WHERE namespace_id = '52d6e22a-fa3f-48ae-aeeb-85b2714664c4'
  AND temporalnamespacedivision IS NULL
ORDER BY coalesce(close_time, '9999-12-31 23:59:59'::timestamp) DESC,
         start_time DESC,
         run_id
LIMIT 31;

Result:

Limit  (cost=0.27..17.92 rows=31 width=532) (actual time=0.103..0.415 rows=31 loops=1)
  Buffers: shared hit=72 dirtied=1
  ->  Index Scan using default_idx on executions_visibility
        (cost=0.27..140.33 rows=246 width=532)
        (actual time=0.102..0.402 rows=31 loops=1)
        Index Cond: (namespace_id = '52d6e22a-fa3f-48ae-aeeb-85b2714664c4'::bpchar)
        Filter: (temporalnamespacedivision IS NULL)
        Rows Removed by Filter: 3
        Buffers: shared hit=72 dirtied=1
Planning Time: 1.400 ms
Execution Time: 0.558 ms

So the exact SQL query appears to execute in ~0.5ms.

We also tried adding an expression index matching the default ordering:

CREATE INDEX CONCURRENTLY IF NOT EXISTS executions_visibility_default_order_idx
ON executions_visibility (
  namespace_id,
  temporalnamespacedivision,
  (coalesce(close_time, '9999-12-31 23:59:59'::timestamp)) DESC,
  start_time DESC,
  run_id
);

ANALYZE executions_visibility;

But the planner still uses default_idx, which is already very fast for this dataset.

Current visibility table size is tiny during this test:

production visibility rows: ~233
total executions_visibility rows: ~240

So this does not appear to be caused by a large visibility table.

What we ruled out

Suspect	Evidence	Current verdict
Temporal UI	Local CLI against `127.0.0.1:7233` also fails	Not UI
Reverse proxy / public network	Same failure locally	Not proxy
CLI command timeout	`--command-timeout 60s` still fails	Not enough / wrong timeout
Postgres query execution	Exact SQL is ~0.5ms	Not raw SQL execution
Missing default visibility index	`default_idx` is used and fast	Unlikely
Unfiltered query only	`ExecutionStatus!="Running"` and `StartTime > today` also fail	Not only empty query
General visibility failure	`ExecutionStatus="Running"` returns immediately	Not general visibility

Current hypothesis

The failure seems specific to closed or mixed workflow visibility listing.

Possibilities we are considering:

Temporal server is blocked after SQL execution while decoding/scanning closed workflow visibility rows.
Some closed workflow row has problematic memo or search_attributes.
The SQL visibility path for closed/mixed queries has a bug/regression in Temporal Server 1.29.1.
The server-side visibility manager timing includes work beyond SQL execution, and that part is hanging.
A rate limiter, semaphore, or persistence wrapper is blocking only on closed/mixed query paths.
The frontend gRPC deadline/internal deadline remains around 10s even when CLI command timeout is increased.

Topic		Replies	Views
Sometime worker is not able to find the running workflow by workflowId Community Support general-impl	9	686	May 18, 2022
Cannot get all workflow executions Community Support python-sdk , web-ui	5	344	July 9, 2024
Temporal UI not displaying correct status for a workflow Community Support	5	2142	December 3, 2020
Executions_visibility does not exist Community Support	2	1517	October 10, 2022
What's the performance impact of using ListWorkflows API? Community Support go-sdk	1	67	February 19, 2025

ListWorkflowExecutions hangs/timeouts for closed or mixed workflow queries, while ExecutionStatus="Running" is fast and Postgres query executes in <1ms

Environment

Symptom

Important finding: running-only query is fast

Postgres observations during the hang

Exact Postgres query is fast

What we ruled out

Current hypothesis

Related topics