Is there any guidance on how to configure networking?
I assume there is some variable I can set where they can reach each other via DNS/HTTP (RPC should work via HTTP2?)? Or do they expect some kind of “local networking”?
Basically unclear is how each service expects to reach each other, and what the minimum required config is to get this running.
Is the frontend the “main” api that everything is supposed to talk to? Or do they all talk via the database?
Apologies if I’m missing some docs here - I’ve been trying to RTFM but not getting very far!
I suppose to ask this another way, is there a matrix that shows what services expect to connect to what? and how?
e.g. what ports/dns/etc they all need to talk to each other correctly? Looks like on AWS, we’ll need to use a network load balancer to expose some of the RPC ports, but its unclear how the other services should be configured to discover/connect to these.
Hey, I can at least answer a few things here until @tihomir can chime in with a better answer
Components of a Temporal cluster (Services, Database, Elasticsearch) do not have special networking requirements. The only rule is that all services must be able to directly address each other which means they should all be freely addressable within the cluster.
While Services communicate directly, they need a way to know what other Services are in the cluster. To solve this, we use a membership dependency within Temporal called Ringpop. Ringpop effectively keeps track of the Service ips within the cluster and broadcasts them so the Services can route to each other effectively.
Everything so far has been regarding intra-cluster communication. In terms of external requests, the only Service which should talk to the outside world is the Frontend service. This is important information when it coms to exposing your Temporal applications on the internet.
Addressable how? Via IP? Can you give them a DNS name behind a load balancer in the case where you have multiple of the same service. If so, how do you provide that DNS name so they all know how to reach other?
Broadcast means something specific in networking terms, I’m not clear if this needs true local network broadcasting or if its some other definition
That makes sense
To expand on #1 a bit, we are looking into using AWS ECS to run the Temporal servers.
They start up fine and can connect fine to the DB, but I think they are struggling to connect to the other services, and I’m not sure how to make them aware of each other.
DNS name makes most sense to me, since I can point the name to a network load balancer which will let us run N copies of each service behind it in order to scale. If they need to use IP, then I think I need to do $something else in AWS/ECS to make local networking work with these services.
[123 ](https://us-east-1.console.aws.amazon.com/ecs/v2/clusters/123-ops-cluster-production/tasks/123?region=us-east-1) {"level":"error","ts":"2023-01-07T00:47:42.050Z","msg":"unable to bootstrap ringpop. retrying","service":"frontend","error":"join duration of 53.504449726s exceeded max 30s
Feels like it’s something to do with the internal IPs used by ECS, but not sure how to proceed / what the issue really is here.
Exact same error if I set BIND_ON_IP to 127.0.0.1, too.
Temporal uses gossip-based membership library ringpop.
Each service when it starts up writes its host addressand port to the cluster_membership table.
You can get this info from tctl, for example:
tctl adm membership list_db
bindOnIP is used to bind a service on specific ip (for example. 0.0.0.0) broadcastAddress can be used when bindOnIP is same across several nodes (again, for example. 0.0.0.0)
unable to bootstrap ringpop
I’m not ECS expert but guess is that you might have to set broadcastAddress in your services configs to ips others can actually discover/connect to.