Getting issue while running multiple temporal instance on aws eks

Nishant_Pathak · February 25, 2023, 11:23am

Hi all,
First of all, thank you for providing such a great tool and awesome community.

I am using temporal in aws eks with below configurations:

3 node cassandra cluster
1 node elastic search for logging
2 node temporal cluster

I am seeing below error on my temporal pods log and which also leads to failure of the java workers in the service. Attaching both logs for reference
Temporal log

java service log

2023-02-25T11:13:48.824Z  WARN 1 --- [ce="default": 1] io.temporal.internal.worker.Poller       : Failure in poller thread Workflow Poller taskQueue="EventFlowQueue", namespace="default": 1 

io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollWorkflowTaskQueue(WorkflowServiceGrpc.java:3750)
	at io.temporal.internal.worker.WorkflowPollTask.doPoll(WorkflowPollTask.java:140)
	at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:122)
	at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:43)
	at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:274)
	at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:237)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Bal_Chua · February 25, 2023, 1:25pm

Did you get the chance to have a look at this FAQ?

The FAQ is about having multiple temporal clusters in 1 kubernetes cluster. And it does not support using 1 DB though.

Nishant_Pathak · February 25, 2023, 1:52pm

Thank @Bal_Chua for quick response. I have gone through that wiki, but unfortunately I was not able to find what I was looking for.

What is the recommended way to run HA temporal on eks cluster. Basically I want to run 2 or more pods of temporal to avoid down time during upgrades.

Here are my service and deployment configurations

apiVersion: v1
kind: Service
metadata:
  namespace: flash
  name: temporal
  annotations:
        service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  ports:
    - name: tp
      port: 7233
      targetPort: 7233
  selector:
    app: temporal

apiVersion: apps/v1
kind: Deployment
metadata:
  name: temporal
  namespace: flash
spec:
  replicas: 2
  selector:
    matchLabels:
      app: temporal
      version: "v1"
  template:
    metadata:
      labels:
        app: temporal
        version: "v1"
    spec:
      containers:
        - env:
            - name: ENABLE_ES
              value: "true"
            - name: ES_SEEDS
              value: elasticsearch
            - name: SKIP_SCHEMA_SETUP
              value: "true"
            - name: ES_USER
              value: kursaha
            - name: ES_PWD
              value: *************** 
            - name: ES_VERSION
              value: v7
            - name: CASSANDRA_SEEDS
              value: cassandra
            - name: CASSANDRA_PORT
              value: "9042"
          image: temporalio/auto-setup:1.20.0
          name: temporal
          ports:
            - containerPort: 7233
      restartPolicy: Always

Bal_Chua · February 26, 2023, 8:32am

Oh my bad, i misunderstood the question.
First I think the autosetup is meant for quick getting started.
I used the helm chart instead. It has many more temporal components each running on their own pods.

Nishant_Pathak · February 26, 2023, 11:07am

Thank you, this helped. I have few follow up questions. I can see multiple pods are running post helm install as below:

NAME                                   READY   STATUS    RESTARTS       AGE
temporal-admintools-654775d8cb-2xlsm   1/1     Running   0              2m14s
temporal-frontend-65556d9ddb-b7dsj     1/1     Running   0              2m14s
temporal-frontend-65556d9ddb-w9h96     1/1     Running   0              2m14s
temporal-history-58d549d4b9-bgd6n      1/1     Running   0              2m14s
temporal-history-58d549d4b9-knhzm      1/1     Running   0              2m13s
temporal-matching-66795f8d4c-9dkkn     1/1     Running   0              2m14s
temporal-matching-66795f8d4c-ds75m     1/1     Running   0              2m14s
temporal-web-6b4cd5c9fd-2zh6g          1/1     Running   0              2m14s
temporal-worker-5fd56bbd5b-ffpr6       1/1     Running   1 (118s ago)   2m14s
temporal-worker-5fd56bbd5b-vn9xk       1/1     Running   0              2m14s

What is the difference between temporal-web and temporal-frontend?
How can we control replication factor of temporal-frontent independently?
In a minimal prod setup, do we need to run web, frontend and admin tool?

Also, there are multiple services as

➜  engagedatadrive git:(prod) kubectl get svc -n temporal  
NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
temporal-admintools          ClusterIP   10.100.104.198   <none>        22/TCP              6m49s
temporal-frontend            ClusterIP   10.100.123.3     <none>        7233/TCP            6m49s
temporal-frontend-headless   ClusterIP   None             <none>        7233/TCP,9090/TCP   6m49s
temporal-history-headless    ClusterIP   None             <none>        7234/TCP,9090/TCP   6m49s
temporal-matching-headless   ClusterIP   None             <none>        7235/TCP,9090/TCP   6m49s
temporal-web                 ClusterIP   10.100.19.226    <none>        8080/TCP            6m49s
temporal-worker-headless     ClusterIP   None             <none>        7239/TCP,9090/TCP   6m49s

What is difference between temporal-frontend and temporal-frontend-headless ?

Bal_Chua · February 26, 2023, 10:10pm

ICYMI, the temporal documentation explains each component in depth Clusters | Temporal Documentation

The kubernetes Service suffixed by headless is to allow Statefulset to be accessed by their pod names thru DNS.
This article gives a good explanation of what it is for.

Nishant_Pathak · February 27, 2023, 5:24am

This helps. Thank you!

jaffarsadik · March 21, 2024, 8:52am

@Nishant_Pathak - What is the fix for the below error?
io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason

tihomir · March 25, 2024, 7:58pm

seen this happen when did you set ssl context, for example in java:
.setSslContext(
SimpleSslContextBuilder.forPKCS8(CLIENT_CERT, CLIENT_KEY)
.setUseInsecureTrustManager(true)
.build())

Topic		Replies	Views
AWS installation guidelines Community Support	3	2517	September 18, 2023
Hosting Temporal on AWS with docker Community Support java-sdk	3	881	July 24, 2023
Issues with temporal pods after upgrading the EKS cluster Server Deployment	0	89	June 20, 2024
How can i scale my temporal on EC2? Community Support general-impl	9	1195	March 14, 2023
How to start many Temporal servers without Kubernetes Community Support docker	2	988	March 31, 2022

Getting issue while running multiple temporal instance on aws eks

Related topics