Hi everyone,
I’m running a Temporal cluster locally on my PC using K3s, and I’m experiencing a very high delay (~8 minutes) on:
Matching ClientGetTaskQueueUserData
as shown in my Grafana dashboard (screenshot attached).
I’m using the Helm chart version 1.0.0-rc.
I would really appreciate any guidance on what I might be misconfiguring.
Environment
-
Local machine
-
K3s single node
-
16 CPUs
-
15 GiB RAM available
-
PostgreSQL 12
-
Helm-based deployment (chart 1.0.0-rc)
Temporal Configuration
Persistence
config:
persistence:
numHistoryShards: 1024
datastores:
default:
sql:
pluginName: postgres12
driverName: postgres12
connectProtocol: tcp
databaseName: temporal
visibility:
sql:
pluginName: postgres12
driverName: postgres12
connectProtocol: tcp
databaseName: temporal_visibility
namespaces:
create: true
namespace:
- name: default
retention: 1d
Dynamic Config
dynamicConfig:
frontend.WorkerHeartbeatsEnabled:
- value: true
constraints: {}
Services
frontend:
replicaCount: 3
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
history:
replicaCount: 8
resources:
requests:
cpu: 800m
memory: 512Mi
limits:
cpu: 2000m
memory: 1Gi
matching:
replicaCount: 4
resources:
requests:
cpu: 500m
memory: 600Mi
limits:
cpu: 1000m
memory: 1Gi
worker:
replicaCount: 1
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
PostgreSQL Configuration
Postgres runs in the same cluster with:
resources:
limits:
cpu: 4000m
memory: 4Gi
requests:
cpu: 1000m
memory: 2Gi
Extended configuration:
max_connections = 500
max_prepared_transactions = 100
shared_buffers = 1536MB
effective_cache_size = 3584MB
work_mem = 32MB
maintenance_work_mem = 768MB
max_locks_per_transaction = 512
max_pred_locks_per_transaction = 512
deadlock_timeout = 5s
wal_buffers = 32MB
min_wal_size = 2GB
max_wal_size = 8GB
wal_compression = on
wal_writer_delay = 200ms
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min
default_statistics_target = 200
random_page_cost = 1.1
effective_io_concurrency = 200
autovacuum = on
autovacuum_max_workers = 4
autovacuum_naptime = 30s
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
Observed Issue
From Grafana:
-
Matching ClientGetTaskQueueUserDatalatency ≈ 8 minutes
