Getting issue while running multiple temporal instance on aws eks

Hi all,
First of all, thank you for providing such a great tool and awesome community.

I am using temporal in aws eks with below configurations:

3 node cassandra cluster
1 node elastic search for logging
2 node temporal cluster

I am seeing below error on my temporal pods log and which also leads to failure of the java workers in the service. Attaching both logs for reference
Temporal log

java service log

2023-02-25T11:13:48.824Z  WARN 1 --- [ce="default": 1] io.temporal.internal.worker.Poller       : Failure in poller thread Workflow Poller taskQueue="EventFlowQueue", namespace="default": 1 

io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.pollWorkflowTaskQueue(WorkflowServiceGrpc.java:3750)
	at io.temporal.internal.worker.WorkflowPollTask.doPoll(WorkflowPollTask.java:140)
	at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:122)
	at io.temporal.internal.worker.WorkflowPollTask.poll(WorkflowPollTask.java:43)
	at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:274)
	at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:237)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Did you get the chance to have a look at this FAQ?

The FAQ is about having multiple temporal clusters in 1 kubernetes cluster. And it does not support using 1 DB though.

Thank @Bal_Chua for quick response. I have gone through that wiki, but unfortunately I was not able to find what I was looking for.

What is the recommended way to run HA temporal on eks cluster. Basically I want to run 2 or more pods of temporal to avoid down time during upgrades.

Here are my service and deployment configurations

apiVersion: v1
kind: Service
metadata:
  namespace: flash
  name: temporal
  annotations:
        service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  ports:
    - name: tp
      port: 7233
      targetPort: 7233
  selector:
    app: temporal
apiVersion: apps/v1
kind: Deployment
metadata:
  name: temporal
  namespace: flash
spec:
  replicas: 2
  selector:
    matchLabels:
      app: temporal
      version: "v1"
  template:
    metadata:
      labels:
        app: temporal
        version: "v1"
    spec:
      containers:
        - env:
            - name: ENABLE_ES
              value: "true"
            - name: ES_SEEDS
              value: elasticsearch
            - name: SKIP_SCHEMA_SETUP
              value: "true"
            - name: ES_USER
              value: kursaha
            - name: ES_PWD
              value: *************** 
            - name: ES_VERSION
              value: v7
            - name: CASSANDRA_SEEDS
              value: cassandra
            - name: CASSANDRA_PORT
              value: "9042"
          image: temporalio/auto-setup:1.20.0
          name: temporal
          ports:
            - containerPort: 7233
      restartPolicy: Always

Oh my bad, i misunderstood the question.
First I think the autosetup is meant for quick getting started.
I used the helm chart instead. It has many more temporal components each running on their own pods.

Thank you, this helped. I have few follow up questions. I can see multiple pods are running post helm install as below:

NAME                                   READY   STATUS    RESTARTS       AGE
temporal-admintools-654775d8cb-2xlsm   1/1     Running   0              2m14s
temporal-frontend-65556d9ddb-b7dsj     1/1     Running   0              2m14s
temporal-frontend-65556d9ddb-w9h96     1/1     Running   0              2m14s
temporal-history-58d549d4b9-bgd6n      1/1     Running   0              2m14s
temporal-history-58d549d4b9-knhzm      1/1     Running   0              2m13s
temporal-matching-66795f8d4c-9dkkn     1/1     Running   0              2m14s
temporal-matching-66795f8d4c-ds75m     1/1     Running   0              2m14s
temporal-web-6b4cd5c9fd-2zh6g          1/1     Running   0              2m14s
temporal-worker-5fd56bbd5b-ffpr6       1/1     Running   1 (118s ago)   2m14s
temporal-worker-5fd56bbd5b-vn9xk       1/1     Running   0              2m14s
  1. What is the difference between temporal-web and temporal-frontend?
  2. How can we control replication factor of temporal-frontent independently?
  3. In a minimal prod setup, do we need to run web, frontend and admin tool?

Also, there are multiple services as

➜  engagedatadrive git:(prod) kubectl get svc -n temporal  
NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
temporal-admintools          ClusterIP   10.100.104.198   <none>        22/TCP              6m49s
temporal-frontend            ClusterIP   10.100.123.3     <none>        7233/TCP            6m49s
temporal-frontend-headless   ClusterIP   None             <none>        7233/TCP,9090/TCP   6m49s
temporal-history-headless    ClusterIP   None             <none>        7234/TCP,9090/TCP   6m49s
temporal-matching-headless   ClusterIP   None             <none>        7235/TCP,9090/TCP   6m49s
temporal-web                 ClusterIP   10.100.19.226    <none>        8080/TCP            6m49s
temporal-worker-headless     ClusterIP   None             <none>        7239/TCP,9090/TCP   6m49s
  1. What is difference between temporal-frontend and temporal-frontend-headless ?

ICYMI, the temporal documentation explains each component in depth Clusters | Temporal Documentation

The kubernetes Service suffixed by headless is to allow Statefulset to be accessed by their pod names thru DNS.
This article gives a good explanation of what it is for.

This helps. Thank you!

@Nishant_Pathak - What is the fix for the below error?
io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason

seen this happen when did you set ssl context, for example in java:
.setSslContext(
SimpleSslContextBuilder.forPKCS8(CLIENT_CERT, CLIENT_KEY)
.setUseInsecureTrustManager(true)
.build())