Hi all,
First of all, thank you for providing such a great tool and awesome community.
I am using temporal in aws eks with below configurations:
3 node cassandra cluster
1 node elastic search for logging
2 node temporal cluster
I am seeing below error on my temporal pods log and which also leads to failure of the java workers in the service. Attaching both logs for reference
Temporal log
java service log
2023-02-25T11:13:48.824Z WARN 1 --- [ce="default": 1] io.temporal.internal.worker.Poller : Failure in poller thread Workflow Poller taskQueue="EventFlowQueue", namespace="default": 1
io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollWorkflowTaskQueue(WorkflowServiceGrpc.java:3750)
at io.temporal.internal.worker.WorkflowPollTask.doPoll(WorkflowPollTask.java:140)
at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:122)
at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:43)
at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:274)
at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:237)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Did you get the chance to have a look at this FAQ?
The FAQ is about having multiple temporal clusters in 1 kubernetes cluster. And it does not support using 1 DB though.
Thank @Bal_Chua for quick response. I have gone through that wiki, but unfortunately I was not able to find what I was looking for.
What is the recommended way to run HA temporal on eks cluster. Basically I want to run 2 or more pods of temporal to avoid down time during upgrades.
Here are my service and deployment configurations
apiVersion: v1
kind: Service
metadata:
namespace: flash
name: temporal
annotations:
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
type: LoadBalancer
ports:
- name: tp
port: 7233
targetPort: 7233
selector:
app: temporal
apiVersion: apps/v1
kind: Deployment
metadata:
name: temporal
namespace: flash
spec:
replicas: 2
selector:
matchLabels:
app: temporal
version: "v1"
template:
metadata:
labels:
app: temporal
version: "v1"
spec:
containers:
- env:
- name: ENABLE_ES
value: "true"
- name: ES_SEEDS
value: elasticsearch
- name: SKIP_SCHEMA_SETUP
value: "true"
- name: ES_USER
value: kursaha
- name: ES_PWD
value: ***************
- name: ES_VERSION
value: v7
- name: CASSANDRA_SEEDS
value: cassandra
- name: CASSANDRA_PORT
value: "9042"
image: temporalio/auto-setup:1.20.0
name: temporal
ports:
- containerPort: 7233
restartPolicy: Always
Oh my bad, i misunderstood the question.
First I think the autosetup is meant for quick getting started.
I used the helm chart instead. It has many more temporal components each running on their own pods.
Thank you, this helped. I have few follow up questions. I can see multiple pods are running post helm install as below:
NAME READY STATUS RESTARTS AGE
temporal-admintools-654775d8cb-2xlsm 1/1 Running 0 2m14s
temporal-frontend-65556d9ddb-b7dsj 1/1 Running 0 2m14s
temporal-frontend-65556d9ddb-w9h96 1/1 Running 0 2m14s
temporal-history-58d549d4b9-bgd6n 1/1 Running 0 2m14s
temporal-history-58d549d4b9-knhzm 1/1 Running 0 2m13s
temporal-matching-66795f8d4c-9dkkn 1/1 Running 0 2m14s
temporal-matching-66795f8d4c-ds75m 1/1 Running 0 2m14s
temporal-web-6b4cd5c9fd-2zh6g 1/1 Running 0 2m14s
temporal-worker-5fd56bbd5b-ffpr6 1/1 Running 1 (118s ago) 2m14s
temporal-worker-5fd56bbd5b-vn9xk 1/1 Running 0 2m14s
- What is the difference between temporal-web and temporal-frontend?
- How can we control replication factor of temporal-frontent independently?
- In a minimal prod setup, do we need to run web, frontend and admin tool?
Also, there are multiple services as
➜ engagedatadrive git:(prod) kubectl get svc -n temporal
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
temporal-admintools ClusterIP 10.100.104.198 <none> 22/TCP 6m49s
temporal-frontend ClusterIP 10.100.123.3 <none> 7233/TCP 6m49s
temporal-frontend-headless ClusterIP None <none> 7233/TCP,9090/TCP 6m49s
temporal-history-headless ClusterIP None <none> 7234/TCP,9090/TCP 6m49s
temporal-matching-headless ClusterIP None <none> 7235/TCP,9090/TCP 6m49s
temporal-web ClusterIP 10.100.19.226 <none> 8080/TCP 6m49s
temporal-worker-headless ClusterIP None <none> 7239/TCP,9090/TCP 6m49s
- What is difference between temporal-frontend and temporal-frontend-headless ?
ICYMI, the temporal documentation explains each component in depth Clusters | Temporal Documentation
The kubernetes Service suffixed by headless
is to allow Statefulset to be accessed by their pod names thru DNS.
This article gives a good explanation of what it is for.
@Nishant_Pathak - What is the fix for the below error?
io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
seen this happen when did you set ssl context, for example in java:
.setSslContext(
SimpleSslContextBuilder.forPKCS8(CLIENT_CERT, CLIENT_KEY)
.setUseInsecureTrustManager(true)
.build())