Have any body tried temporal multi cluster replication on differnt regions?

I am presently running a single cluster temporal. For DR reasons, i am planning to upgrade my setup to a multi cluster one running in two differnt regions on AWS (us west 2 and us east 2).

Have any body tried this already , if so please share your learnings.

For dxc replication on AWS what are the best practices?

Do i need to do a VPC peering between the regions?

IS TLS setup mandatory for Multi cluster setup ( assuming i have done VPC peering)?

Are the replication lags between a VPC Peering setup (without TLS) and over the internet replication with TLS comparable? or will the the lags be many folds higher in later case?

Hi @madhu

I can take a stab at answering some of your questions:

For DR reasons, i am planning to upgrade my setup to a multi cluster one running in two differnt regions on AWS (us west 2 and us east 2)

Currently a Temporal namespace is either local to a DC or global. Once its set it is immutable.

IS TLS setup mandatory for Multi cluster setup ( assuming i have done VPC peering)?

No, but since your data might be transmitted through public networks(s), TSL should probably be required in most cases.

1 Like

Thaks @tihomir, can you elobrate on global vs local name space?
For a DR/Replica scenario, should i set my namespace as global?
My name space is local presently.

Basically, when should i configure my namespace as global and when should it be local?

So, follow-up on this question. I am unable to get multi-region connectivity working.

I have two clusters set up in two different AWS regions. Each of the clusters works fine and my local java SDK-app can connect and use either cluster. I have ALBs fronting the frontend services. I’ve configured the clusterInformation section as defined in the Temporal Server Configuration docs using the dns:/// option as outlined in the docs pointing at my frontend ALBs. I’ve set up VPC peering between the two regions and have confirmed connectivity from the pods in one cluster to the frontend ALB in the other cluster. I even deployed a gRPC cli image to the clusters to verify that I could invoke the gRPC frontend service on the other cluster.

However, I’m seeing i/o timeout errors in the history logs of one cluster when connecting to the frontend of the other cluster:

{
    "level": "error",
    "ts": "2021-08-25T18:51:27.133Z",
    "msg": "Failed to get replication tasks",
    "service": "history",
    "error": "last connection error: connection error: desc = \"transport: Error while dialing dial tcp x.x.46.188:433: i/o timeout\"",
    "logging-call-at": "replicationTaskFetcher.go: 395",
    "stacktrace": "go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go: 143\ngo.temporal.io/server/service/history.(*replicationTaskFetcherWorker).getMessages\n\t/temporal/service/history/replicationTaskFetcher.go: 395\ngo.temporal.io/server/service/history.(*replicationTaskFetcherWorker).fetchTasks\n\t/temporal/service/history/replicationTaskFetcher.go: 338"
}

I’m wondering if it has something to do with the ALB configuration as the ALB health checks for the frontend ingress are being reported as unhealthy. I am unable to get the health checks to pass.

However, I’m also wondering why the dial is using an IP address to connect when I’m using dns://hostname:port in the cluster information (rpcAddress: dns:///alb-ingress.example.com:443). When I try the same gRPC service call using a gRPC cli deployed to a pod, it fails.

i was able to setup the same with single docker compose using this docker file

As far as the VPC setup goes, can you check if you have the routing table entries correctly configured?

Are you using kuberneters? if you you may want to create an internal load balancer for your temporal services xdc communication

also check your security groups (mostly timeouts are generally due to security group/missing routing table entries in vpc peering scenarios)

I would agree that network timeout issues are frequently security group related. However, since I’ve confirmed that the frontend service alb for each cluster correctly routes to the frontend service pod from a purpose-built pod with gRPCurl included and deployed to one cluster for the sole intent of verifying gRPC services in the other cluster are available (i…e. so I can kubectl exec to the pod and run: grpcurl --insecure frontend.othercluster.example.com:443 list), I’m fairly certain that I can eliminate security groups, VPC Peering or routing table issues as the cause as the grpcurl command to the other cluster works from within the pod.

Can someone confirm that temporal uses host names or IPs when connecting to other clusters in a multi-cluster setup? It appears to be using IPs instead of hostnames even though I am providing a hostname in the clusterInformation. I can confirm that IP-based routing does not work as my ALB has a routing rule that specifically routes to target groups based on hostname (i.e. grpcurl --insecure x.x.23.123:443 list does not work).

Ok.
as far as dns is concerned i tried it works for me ( when connection two clusers), ofcouse i have tried it only through docker compose (with in same network).

clusterMetadata:
  enableGlobalNamespace: true
  failoverVersionIncrement: 10
  masterClusterName: "primary"
  currentClusterName: "primary"
  clusterInformation:
    primary:
      enabled: true
      initialFailoverVersion: 1
      rpcName: "frontend"
      rpcAddress: "dns:///temporal-primary:7233"
    secondary:
      enabled: true
      initialFailoverVersion: 2
      rpcName: "frontend"
      rpcAddress: "dns:///temporal-secondary:7233"

So, I think I have finally figured out the correct EKS ingress annotation configurations to correctly configure the ALB ingress’s default rule to just forward all requests to the target group instead of sending a 404 to all requests that don’t match the host/path. This change finally allows allows ip-based grpc requests to actually make it to the ingress/kube layer.

Now that I’m sure all requests are making it to the kube layer, I am pretty sure that temporal is using IPs (without an Authority header) when connecting to the rpcAddress configured in the clusterInformation, even when a dns name is specified. Using IPs instead of hostnames works fine for cluster-local resolution as the rpcAddress is usually the service name which will ultimately resolve to the cluster-local service IP and traffic will route correctly as it does not rely on the host name at all.

However, when a kubernetes ingress is involved (i.e. cluster in east1 connecting to cluster in east2), using an IP to connect to the other cluster (without providing an authority header) causes the ingress rules in the kube layer to not route the request to the service as it will not be able to match the request to any of the hosts specified in the ingress rule (and an ingress rule host record cannot have an IP address).

I believe I have replicated what the pods are seeing (i/o timeout) in a separately deployed pod I’ve deployed with the grpcurl cli tool that I am using to test gRPC connectivity to the other cluster’s ingress. What I’ve found using the grpcurl tool in this pod is that when I try to connect to the ingress using just IP address:

grpcurl --insecure x.x.23.144:443 list

I get a connection timeout

Failed to dial target host "x.x.23.144:433": context deadline exceeded

However, if I provide an Authority header when connecting using the ip I get the same successful response as I would get when I use the hostname :

## From an east1 cluster pod
bash-5.1$ grpcurl --insecure -authority myfrontend.temporal-eks-east2.example.com x.x.23.144:443 list
grpc.health.v1.Health
grpc.reflection.v1alpha.ServerReflection
temporal.api.workflowservice.v1.WorkflowService
temporal.server.api.adminservice.v1.AdminService

Still not sure what the resolution is though.

I havent tried it on kuber netes yet,
by i was think of using an non internet facing nlb with the following annoations to make to two clusers talk to each other

    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '60' 
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: 'true'
    service.beta.kubernetes.io/aws-load-balancer-type: nlb

I have already verified that hostname-based grpc calls work fine from the east1 cluster to the east2 cluster. I’m fairly certain that the ALB in east2 is routing all traffic from east1 fine. I think my problem is when IP-only gRPC requests are sent, the ingress controller in the kubernetes layer inspects the host-name on the request to try to match it to a routing rule defined in the ingress yaml and since no host name is included on the request it doesn’t find a match making the request unroutable and is therefore dropped, causing the i/o timeout. I can’t find another logical reason for the issue I’m seeing. If temporal were using a hostname, I’m fairly certain it would be working.

Also, I find it weird that the temporal helm charts do not include a frontend ingress definition. I had to add one manually to my local fork. For anyone looking for a (mostly) working AWS frontend ingress definition, my frontend ingress looks something like this (note health check still is failing, not sure why, also this config isn’t using ssl from the alb to the frontend pod).

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/actions.default-rule: |
      {"type":"forward","forwardConfig":{"targetGroups":[{"serviceName":"temporal-frontend","servicePort":"7233"}]}}
    alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig":
      { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
    alb.ingress.kubernetes.io/backend-protocol: HTTP
    alb.ingress.kubernetes.io/backend-protocol-version: GRPC
    alb.ingress.kubernetes.io/certificate-arn: my-arn
    alb.ingress.kubernetes.io/healthcheck-path: /grpc.health.v1.Health/Check
    alb.ingress.kubernetes.io/healthcheck-protocol: HTTP
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: routing.http2.enabled=true
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/security-groups: my-sec-group
    alb.ingress.kubernetes.io/tags: env=dev
    alb.ingress.kubernetes.io/target-type: ip
    kubernetes.io/ingress.class: alb
  labels:
    app.kubernetes.io/component: frontend
    app.kubernetes.io/instance: temporal
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: temporal
    app.kubernetes.io/part-of: temporal
    app.kubernetes.io/version: 1.11.3
    helm.sh/chart: temporal-0.11.3
  name: temporal-frontend-ingress
spec:
  backend:
    serviceName: default-rule
    servicePort: use-annotation
  rules:
  - host: frontend-east2.temporal.example.com
    http:
      paths:
      - backend:
          serviceName: temporal-frontend
          servicePort: 7233
        path: /*

Update: I was never able to get replication working using an ALB. I had to revert to using an L4 NLB.

Thanks @arnesenfamily

any idea how to use nginx ingress instead of ALB/ L4 NLB

Not sure, I have not played around with fronting temporal server components with nginx. However, given that nginx is an L7 routing mechanism, I’m not sure it would work for xdc replication as I was not able to get an L7 load balancer (ALB) to work as the rpcAddress for xdc replication. However, I was not able to confirm that it was in fact L4 vs L7 causing my problems. I only know that when I swap the rpcAddress value in the temporal clusterMetadata config (leaving everything else about the config the same) an L4 load balancer works and the L7 one does not, even though both load balancers support HTTP2/gRPC and other grpc tools (i.e. grpcurl and even the temporal sdks) work fine with an L7 load balancer.

SDK access seems to work fine with an L7 load balancer though. So, you should be able to put nginx in front of the frontend service instances that SDKs use.

As to the technical implementation of the nxinx solution to front temporal services, I’m not sure. Have you looked at this blog post by the nginx folks that seems to discuss exactly what you’re trying to do?

Thanks @arnesenfamily for the point, lemme check this out, One of my collegues tried gRPC - NGINX Ingress Controller (kubernetes.github.io) and we were seeing that nginx does not forward the rquest upstream and throws 400.

Will check the configs you have suggested.

Between since xdc replications (rpc) happens only through the front end, so if SDK can use L7 lb, xdc replication/sync should also be able to use it…

can you share the ingress chart you are using for nlb?

@tihomir @Vitaly @Wenquan_Xing can you take a look at this? What will be temporals recommended way of setting up xdc in AWS using k8s/EKS?

Ok, i got replication setup working, my key learnings were

a) its the DR( replication site) which pulls the data from main site, so if you are setting up VPC… its important for your secondary region to reach the primary region, and not vice versa.

b) as @arnesenfamily mentioned, L7 load balncers does not work (ALB, NGIIX ETC). so L4 NLB is your friend.

c) @arnesenfamily observation about Helm is correct. you may have to modify your helms to expose your front end as an ingress/service so that it could be consumed externally.

I used this yaml for exposing my front end through an internal nlb

apiVersion: v1
kind: Service
metadata:
   name: temporal-dr
   namespace: temporal
   annotations:
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
    service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  type: LoadBalancer
  ports:
  - name: grpc-rpc
    port: 7233
    protocol: TCP
    targetPort: rpc
  selector:
    app.kubernetes.io/component: frontend
    app.kubernetes.io/instance: temporal
    app.kubernetes.io/name: temporal

Both clusters poll from each other as you want to be able to fail over and back. So make sure that they can reach each other.

2 Likes

My NLB config is very similar to yours. The differences I have are that I’ve got S3 bucket logging enabled, external DNS integration via annotations, a few target group attributes specified, some resource tags specified and I’m using externalTrafficPolicy: Cluster. Other than that, my NLB config seems very similar to what you ended up with.

Also, I ended up getting rid of the ALB to the frontend and just using the NLB for all communications to the frontend (sdks and xdc).

Finally, I have found that the publicClient.host value cannot be a DNS name. When I set it to an external DNS name pointing to the temporal-frontend pod, the worker pod crashes with a failure to start the history scanner. When I switch it back to the cluster-local temporal-frontend:7233, the worker starts the history scanner as expected.

1 Like