Hi there - we are trying to deploy self-hosted Temporal on Amazon EKS and are really struggling.
We keep getting hit with error after error. Some examples:
When we set the password through AWS secret manager it says it is unable to read the secret name
We are unable to deploy the temporal schema job
We are unable to deploy the temporal frontend pod (and the logs aren’t showing anything)
We just continue to get this error in the developer console of the temporal web frontend
last connection error: connection error: desc = "transport: Error while dialing: dial tcp 172.20.41.63:7233: connect: connection refused"
I know this is an incredibly broad issue, but does anyone in the community have any advice? Any best practice guides, common bugs, “gotchas” we should be aware of? Alternatively, would someone on The Temporal team be open to getting on a debugging call?
I think you need to provide more context to useful advice. How are you trying to install Temporal? Through its Helm charts or some custom method?
The errors seem to indicate connection problems. Either services are not up or not allowing access. The Temporal UI needs to access the Temporal frontend. If the latter is not running, you won’t get much useful information, if any at all.
If the schema job has not run yet, the frontend would not be able to come up, so no surprise there.
I’d take a step back and only progress until a previous step has succeeded. So to start with I look at the DB and the schema job. Can you access the DB via the CLI or some other form? What is the error when running the schema jobs? I am not sure about if and how AWS secret manager is supported, but assuming that’s the problem, you could try to run with explicit username/password secrets to narrow down the problem.