Temporal history and Matching service health check

Hi All,

I am able to create a health url for temporal front end but when i am trying the same for matching Service and history service it is not working as expected . Please find the code below .

How to create historyservice.HistoryServiceClient using golang .

const (
frontendService = “temporal.api.workflowservice.v1.WorkflowService”
matchingService = “temporal.api.workflowservice.v1.MatchingService”
historyService = “temporal.api.workflowservice.v1.HistoryService”

)

func temporalMatchingHealth(c *gin.Context) {

hostPort := "localhost:7233"
grpcSecurityOptions := grpc.WithInsecure()
dialOpts := []grpc.DialOption{
	grpcSecurityOptions,
}
connection, _ := grpc.Dial(hostPort, dialOpts...)

healthClient := healthpb.NewHealthClient(connection)
req := &healthpb.HealthCheckRequest{
	Service: matchingService,
}
resp, err := healthClient.Check(c, req)

if err != nil {
	fmt.Printf("err@@@@$$$$%%%% ", err.Error())
}
fmt.Printf("%s: ", resp.Status)

fmt.Printf("resp.Status ", resp)
fmt.Printf("resp.Status ", resp)
fmt.Printf("%s: ", req.GetService())
if resp.Status != healthpb.HealthCheckResponse_SERVING {
	fmt.Println("1>>>>>",resp.Status)
	//osExit(1)
}

fmt.Println("0>>>>>>",resp.Status)

}
func temporalHistoryHealth(c *gin.Context) {

hostPort := "localhost:7233"
grpcSecurityOptions := grpc.WithInsecure()
dialOpts := []grpc.DialOption{
	grpcSecurityOptions,
}
connection, _ := grpc.Dial(hostPort, dialOpts...)
client := historyservice.NewHistoryServiceClient(connection)

	client.DescribeHistoryHost(c,req,dialOpts)

}

func temporalFrontendHealth(c *gin.Context) {

hostPort := "localhost:7233"
grpcSecurityOptions := grpc.WithInsecure()
dialOpts := []grpc.DialOption{
	grpcSecurityOptions,
}
connection, _ := grpc.Dial(hostPort, dialOpts...)

healthClient := healthpb.NewHealthClient(connection)
req := &healthpb.HealthCheckRequest{
	Service: frontendService,
}
resp, err := healthClient.Check(c, req)

if err != nil {
	fmt.Printf("err@@@@$$$$%%%% ", err.Error())
	}
fmt.Printf("%s: ", resp.Status)

fmt.Printf("resp.Status ", resp.Status)
fmt.Printf("resp.Status ", resp)
fmt.Printf("%s: ", req.GetService())
if resp.Status != healthpb.HealthCheckResponse_SERVING {
	fmt.Println("1>>>>>",resp.Status)
	//osExit(1)
}

fmt.Println("0>>>>>>",resp.Status)

//c.SecureJSON(http.StatusOK, strings.Join(defaultDataConverter.ToStrings(resp), ", "))

}

Here is the code that tctl uses for front end health check.
tctl cluster health

You should be able to use the same code and change
fullWorkflowServiceName
for other services, namely:

temporal.api.workflowservice.v1.WorkflowService
temporal.api.workflowservice.v1.HistoryService
temporal.api.workflowservice.v1.MatchingService

For full comprehensive picture of your cluster health however we recommend setting up server metrics, dashboards and alerting, in addition to health checks.

Hi Tihomir,

Yes. I did the same. Please find the code below. I am getting below error when running the code.
The same code is working as expected for frontend.

SERVICE_UNKNOWN: resp.Status %!(EXTRA grpc_health_v1.HealthCheckResponse_ServingStatus=SERVICE_UNKNOWN)resp.Status %!(EXTRA *grpc_health_v1.HealthCheckResponse=status:SERVICE_UNKNOWN)temporal.api.workflowservice.v1.MatchingService: 1>>>>> SERVICE_UNKNOWN
0>>>>>> SERVICE_UNKNOWN

const (
frontendService = “temporal.api.workflowservice.v1.WorkflowService”
matchingService = “temporal.api.workflowservice.v1.MatchingService”
historyService = “temporal.api.workflowservice.v1.HistoryService”

)

func temporalMatchingHealth(c *gin.Context) {

//hostPort := "10.0.117.75:7233"
hostPort := "localhost:7233"
grpcSecurityOptions := grpc.WithInsecure()
dialOpts := []grpc.DialOption{
	grpcSecurityOptions,
}
connection, _ := grpc.Dial(hostPort, dialOpts...)

healthClient := healthpb.NewHealthClient(connection)
req := &healthpb.HealthCheckRequest{
	Service: matchingService,
}
resp, err := healthClient.Check(c, req)

if err != nil {
	fmt.Printf("err@@@@$$$$%%%% ", err.Error())
}
fmt.Printf("%s: ", resp.Status)

fmt.Printf("resp.Status ", resp.Status)
fmt.Printf("resp.Status ", resp)
fmt.Printf("%s: ", req.GetService())
if resp.Status != healthpb.HealthCheckResponse_SERVING {
	fmt.Println("1>>>>>",resp.Status)
	//osExit(1)
}

fmt.Println("0>>>>>>",resp.Status)

}

any help on this? still not able to figure how to create health client for matching and history.

Are you deploying via docker-compose? Each service exposes a grpc port, see here

If you are deploying via docker-compose you would need to also expose those ports, so for example for matching and history services in your docker-compose file:

temporal:
...
ports:
      - 7233:7233
      - 7235:7235
      - 7234:7234

and then you should be able to run for example:

./grpc-health-probe -addr=localhost:7235 -service=temporal.api.workflowservice.v1.MatchingService

./grpc-health-probe -addr=localhost:7234 -service=temporal.api.workflowservice.v1.HistoryService

since each Temporal service exposes a /grpc.health.v1.Health/Check endpoint

You should be able to do the same in your code by exposing the grpc ports and updating hostPort as well as service name

Hi Tihomir,

Thank you for the response. But when we deploy temporal in kubernetes cluster we are getting only cluster ip for frontend server how to we communicate with history and matching server is via frontend.

How do we test health of matching and history in this case. I am thinking there should be way to create a history and matching client.

Thanks
phani

Kubernetes has its grpc-health-probe that you should be able to use and seems recommended. Have you tried it and ran into any issues?

Hi Tihomir,

The problem is matching and history services wont have cluster ip in kuberenetes all the communication has to happen from frontend but when trying to connect to history service with service name temporal.api.workflowservice.v1.HistoryService we are getting service unknown.

Thanks
phani

Are you using Temporal helm charts?
You should be able to set cluster ip on all services, see for example here.

Hi Tihomir,

We installed using temporal helm charts only but we are able to see cluster ip only for temporal frontend only.

Thanks
phani

Hi,

I can see in the matching handler we have check function as below. Do we need to create a handler to call check for matching?

const (
serviceName = “temporal.api.workflowservice.v1.MatchingService”
)

func (h *Handler) Check(_ context.Context, request *healthpb.HealthCheckRequest) (*healthpb.HealthCheckResponse, error) {
h.logger.Debug(“Matching service health check endpoint (gRPC) reached.”)

h.startWG.Wait()

if request.Service != serviceName {
	return &healthpb.HealthCheckResponse{
		Status: healthpb.HealthCheckResponse_SERVICE_UNKNOWN,
	}, nil
}

hs := &healthpb.HealthCheckResponse{
	Status: healthpb.HealthCheckResponse_SERVING,
}
return hs, nil

}

Don’t think you need to create a handler. Were you able to get a cluster ip by uncommenting
# type: ClusterIP in config?
If so, can you run healthcheck via grpc-health-probe or the kubernetes one previously mentioned?

If we do that will the frontend communicate with history and matching ,worker via cluster ip?

Temporal services talk to each other using ips rather than dns provided from the k8s service constructs. It’s configurable via server configmap.