High delay (8m) on Matching ClientGetTaskQueueUserData in local cluster

Hi everyone,

I’m running a Temporal cluster locally on my PC using K3s, and I’m experiencing a very high delay (~8 minutes) on:

Matching ClientGetTaskQueueUserData

as shown in my Grafana dashboard (screenshot attached).

I’m using the Helm chart version 1.0.0-rc.

I would really appreciate any guidance on what I might be misconfiguring.


Environment

  • Local machine

  • K3s single node

  • 16 CPUs

  • 15 GiB RAM available

  • PostgreSQL 12

  • Helm-based deployment (chart 1.0.0-rc)


Temporal Configuration

Persistence

config:
  persistence:
    numHistoryShards: 1024
    datastores:
      default:
        sql:
          pluginName: postgres12
          driverName: postgres12
          connectProtocol: tcp
          databaseName: temporal

      visibility:
        sql:
          pluginName: postgres12
          driverName: postgres12
          connectProtocol: tcp
          databaseName: temporal_visibility

  namespaces:
    create: true
    namespace:
      - name: default
        retention: 1d


Dynamic Config

dynamicConfig:
  frontend.WorkerHeartbeatsEnabled:
    - value: true
      constraints: {}


Services

frontend:
  replicaCount: 3
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 1000m
      memory: 512Mi

history:
  replicaCount: 8
  resources:
    requests:
      cpu: 800m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 1Gi

matching:
  replicaCount: 4
  resources:
    requests:
      cpu: 500m
      memory: 600Mi
    limits:
      cpu: 1000m
      memory: 1Gi

worker:
  replicaCount: 1
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 1000m
      memory: 512Mi


PostgreSQL Configuration

Postgres runs in the same cluster with:

resources:
  limits:
    cpu: 4000m
    memory: 4Gi
  requests:
    cpu: 1000m
    memory: 2Gi

Extended configuration:

max_connections = 500
max_prepared_transactions = 100

shared_buffers = 1536MB
effective_cache_size = 3584MB
work_mem = 32MB
maintenance_work_mem = 768MB

max_locks_per_transaction = 512
max_pred_locks_per_transaction = 512
deadlock_timeout = 5s

wal_buffers = 32MB
min_wal_size = 2GB
max_wal_size = 8GB
wal_compression = on
wal_writer_delay = 200ms
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min

default_statistics_target = 200
random_page_cost = 1.1
effective_io_concurrency = 200

autovacuum = on
autovacuum_max_workers = 4
autovacuum_naptime = 30s
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05


Observed Issue

From Grafana:

GetTaskQueueUserData is used for propagating task queue user data.
it is a long poll operation, so its ok for it to take longer. typically tho its around 5min, seeing it as 8 is somewhat elevated. can you share query you use in graph?

Thanks for the clarification.

I’m using the following Prometheus query in the graph:

histogram_quantile(
  0.95,
  sum(
    rate(client_latency_bucket{type="frontend", service_role="matching"}[5m])
  ) by (operation, le)
)

In particular, I’m looking at the operation="GetTaskQueueUserData" label and checking the p95.