Temporal Web GRPC Connectivity Issue - AWS ECS Fargate

Hey all! I have a test cluster setup on AWS ECS Fargate running the elastic-search, temporal autosetup, and temporal web containers. Since these containers are running on Fargate, they use the awsvpc networking mode and with them all being run from the same task definition, they should be able to connect to one another using the localhost address and respective ports.

I was able to connect the temporal service and the elasticsearch instance - verified the service is sending requests to elasticsearch. Also, the main temporal service seems to be working properly - I tested the grpc connection and was also able to connect to it with the SDK. However, I cannot seem to connect the temporal web container to the grpc endpoint. When I load temporal web in the browser. I get the following error message 14 UNAVAILABLE: failed to connect to all addresses. method: getVersionInfo, req: undefined .

I have the TEMPORAL_GRPC_ENDPOINT var set to 127.0.0.1:7233 which should allow it to connect to the temporal service properly given the networking interface that is setup between the containers and the fact that I’m able to connect to the elasticsearch container at 127.0.0.1:9200 , but unfortunately I haven’t been able to get this working.Has anyone run into this issue by chance or have any insights on where to look next?

1 Like

Here is a post with a similar issue that ended up being a configuration problem. Would help if you can
provide more info on how you configure server and web?

1 Like

Hi @tihomir , thanks for the quick response. We’re using the default out of the box config from development_es.yaml

As far as the container configuration, we are using the ECS Fargate equivalent of what’s in the docker-compose-mysql-es.ymll file.

Which looks something like this (including only the web and server container definitions below)

[{
		"name":"temporal-server",
		"image":"temporalio/auto-setup:1.14.0",
		"essential": true,
		"portMappings": [{
			"containerPort":"7233",
			"hostPort":"7233"
		}],
		"environment": [{
				"name": "DB",
				"value": "mysql"
			},
			{
				"name": "DB_PORT",
				"value": "3306"
			},
			{
				"name": "MYSQL_USER",
				"value": "<db_username>"
			},
			{
				"name": "MYSQL_PWD",
				"value": "<db_password>"
			},
			{
				"name": "MYSQL_SEEDS",
				"value": "<db_host>"
			},
			{
				"name": "DYNAMIC_CONFIG_FILE_PATH",
				"value": "/etc/temporal/config/dynamicconfig/development_es.yaml"
			},
			{
				"name": "ENABLE_ES",
				"value": "true"
			},
			{
				"name": "ES_SEEDS",
				"value":"127.0.0.1"
			},
			{
				"name": "ES_VERSION",
				"value": "v7"
			}
		],
		"memory": 512,
		"cpu": 256,
		"dependsOn": [{
			"containerName": "elasticsearch",
			"condition": "START"
		}]
	},
{
		"name":"temporal-web",
		"image":"temporalio/web:1.13.0",
		"essential": true,
		"portMappings": [{
			"containerPort":"8088",
			"hostPort":"8088"
		}],
		"environment": [
                         {
				"name": "TEMPORAL_GRPC_ENDPOINT",
				"value": "127.0.0.1:7233"
			}
			{
				"name": "TEMPORAL_PERMIT_WRITE_API",
				"value": "true"
			}
		],
		"memory": 512,
		"cpu": 256,
		"dependsOn": [{
			"containerName": "temporal-server",
			"condition": "START"
		}]
	}]
 requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"    # Using awsvpc as our network mode as this is required for Fargate
  memory                   = 4096         # Specifying the memory our container requires
  cpu                      = 2048         # Specifying the CPU our container requires
  execution_role_arn       = <ecs_task_execution_role>

For additional context, the only other contain we have defined as a part of this task is the elasticsearch container. We opted out of the admin-tools container for the time being - which is the only major difference between the docker-compose-mysql-es.yml file and our ECS task definition.

I hope this helps paint the picture a bit better, please let me know if there are any additional details I can provide.

Thank you for help on this!

After some additional digging and a bit of support from the AWS team, we’ve come across the issue.

Inside of entrypoint.sh BIND_ON_IP will use the hostname when BIND_ON_IP isn’t already defined.

This forces the application to serve on the machine’s ip address rather than the loopback address, which the containers under the same task definition communicate on when using AWS ECS Fargate or the awsvpc networking mode.

To resolve this issue, we added an additional environment variable in our temporal-server task definition for BIND_ON_IP and set it to 127.0.0.1. Which is working beautifully now :slight_smile:

I did also notice an option to set bindOnLocalHost in the deployment config, which looks to essentially do the same thing as setting BIND_ON_IP=127.0.0.1.

Using the bindOnLocalHost param is probably the better route to go here, since it was designed to be that way, but we wanted to continue using the temporal hosted containers for now, since this is just a development cluster and we’re still in the early stages of experimentation. Setting BIND_ON_IP=127.0.0.1 was just a quick and easy hackaround.

I hope this helps anyone else that runs into this issue!

1 Like