Deploying in AWS - healthchecks

Shawn_Lim · June 21, 2021, 9:45am

Does temporal server have any healthcheck endpoint? How can I set up the target groups to hit temporal’s gRPC endpoints?

I see from aws docs - Health checks for your target groups - Elastic Load Balancing

The destination for health checks on the targets.

If the protocol version is HTTP/1.1 or HTTP/2, specify a valid URI (/ *path* ? *query* ). The default is /.

If the protocol version is gRPC, specify the path of a custom health check method with the format `/Package.Class/method` . The default is `/AWS.ALB/healthcheck` .

derek · June 22, 2021, 5:30pm

Probably the easiest way to go here is to pass a health check by opening a tcp connection to the grpc service endpoint - if you can connect its “healthy”

Shawn_Lim · June 22, 2021, 5:52pm

I’m not so familiar with gRPC. Coming from mostly HTTP RESTful endpoints. What’s the “grpc service endpoint” you are referring to?

I was expecting something equivalent to /healthcheck for http endpoints.

derek · June 22, 2021, 6:13pm

we don’t have a real grpc health check right now - so best recommendation is just to try and open a connection. the grpc service endpoint i’m talking about is just the temporal endpoint you’d point your client sdk to.

derek · June 23, 2021, 5:07pm

doh - so i was wrong here. temporal itself does have grpc health checks but we’re not using them in our helm charts and instead do a tcp check.

for health checks using target groups you can set the path to: /temporal.api.workflowservice.v1.WorkflowService/Check

My apologies for steering you wrong initially. I’ve added an issue to the helm chart repo to track this also - [Feature Request] Add gRPC heath check via grpc_health_probe · Issue #203 · temporalio/helm-charts · GitHub

Shawn_Lim · June 24, 2021, 1:51am

Hey @derek , thanks for the follow up. I tried the path you mentioned, but it appears that the target group is failing. Not sure if that path exists.

I tried to look up api/service.proto at master · temporalio/api · GitHub to see if such a healthcheck endpoint exists but found nothing.

Here’s my terraform options I passed to https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lb_target_group :

path = "/temporal.api.workflowservice.v1.WorkflowService/Check"
matcher = "0"

alex · June 24, 2021, 5:04am

After some investigation I think the right path would be:

/grpc.health.v1.Health/Check

If you set service field in request message to temporal.api.workflowservice.v1.WorkflowService. it should reply with:

{
  "status": "SERVING"
}

but even if you don’t, it looks like health check responds with 200 even if response message is

{
  "status": "SERVICE_UNKNOWN"
}

Shawn_Lim · June 24, 2021, 7:47am

thanks this works now!

abelokry · September 10, 2021, 5:40pm

Did you get it to work just over HTTP2?
How did you get it to work, can you provide curl call?

RichRM · January 4, 2023, 7:21pm

Just to confirm, with these health checks, should I be able to hit any of the services on “service.name.com:7233/grpc.health.v1.Health/Check” ?

Struggling to find the correct combination to make AWS Application Load Balancers happy. Looks like they expect an HTTP response though… hmm

curl 172.17.0.3:7233/grpc.health.v1.Health/Check -v
*   Trying 172.17.0.3:7233...
* Connected to 172.17.0.3 (172.17.0.3) port 7233 (#0)
> GET /grpc.health.v1.Health/Check HTTP/1.1
> Host: 172.17.0.3:7233
> User-Agent: curl/7.83.1
> Accept: */*
> 
* Received HTTP/0.9 when not allowed
* Closing connection 0
curl: (1) Received HTTP/0.9 when not allowed

I’m not sure if its going to be possible to make this play nice with an application load balancer. A network one may work…

tihomir · January 4, 2023, 8:35pm

Just to confirm, with these health checks, should I be able to hit any of the services on “service.name.com:7233/grpc.health.v1.Health/Check” ?

Each temporal server service type has default grpc port (see here) which you can change if you needed. Note that the worker service does not expose a health check currently.

The health check request should be grpc so not sure curl would work. For frontend service you could use grpcurl for example:

grpcurl -plaintext -d '{"service": "temporal.api.workflowservice.v1.WorkflowService"}' 127.0.0.1:7233 grpc.health.v1.Health/Check

Don’t think this is possible with matching / history service as they don’t expose reflection api currently. You can however use grpc health probe for all frontend / matching / history:

./grpc-health-probe -addr=127.0.0.1:7233 -service=temporal.api.workflowservice.v1.WorkflowService

./grpc-health-probe -addr=127.0.0.1:7235 -service=temporal.api.workflowservice.v1.MatchingService

./grpc-health-probe -addr=127.0.0.1:7234 -service=temporal.api.workflowservice.v1.HistoryService

(change the address part to fit your setup)

RichRM · January 5, 2023, 8:48pm

OK, so if I make the GPRC ports available via a network load balancer, that part will work. And I can easily give them a consistent name via DNS.

What about the membership ports? I’m unclear on what the local networking requirements are / how the broadcast address comes into play when running these as separate containers on ECS.

I’m assuming if I gave the membership ports their own network load balancers, that may work, but then how does each service know how to find one another? Since I’m not sure how local broadcasting will work on ECS.

Can I pass an env var to each service that will tell it how to find the other services based on their hostname (and associated ports exposed via the load balancer?)

Would setting TEMPORAL_BROADCAST_ADDRESS to the DNS name work?

kuilan · September 4, 2024, 6:23pm

Did you use a Network Load Balancer for all the temporal components?

Or is the frontend component the only one behind a load balancer?

Topic		Replies	Views
Health Check / Ping Temporal Frontend via grpcurl Community Support healthcheck	2	1854	July 31, 2021
AWS EKS deployment, gRPC health check failing Community Support go-sdk , aws	3	1695	October 28, 2021
Exposing grpc Frontend via ALB Community Support helm , general-impl , configuration , deployment	4	2293	June 20, 2022
Unable to run temporal in Knative/Google Cloud Run Community Support	6	2200	May 20, 2021
Temporal Server Logs location & Health check context url Community Support go-sdk	6	3977	September 14, 2021

Deploying in AWS - healthchecks

Related topics