visibilityMetricsClient exceptions in temporal-frontend pod

Hi I brought up the default k8s setup with slightly altered es setting
elasticsearch:
enabled: true
replicas: 1
minimumMasterNodes:

in my front end logs i see this message

"level":"info","ts":"2020-08-08T10:33:50.120Z","msg":"Get dynamic config","name":"frontend.historyMaxPageSize","value":"1000","default-value":"1000","logging-call-at":"config.go:78"}
{"level":"warn","ts":"2020-08-08T11:45:46.205Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"service_authorization_latency\", help: \"service_authorization_latency histogram\", constLabels: {}, variableLabels: [type operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
{"level":"info","ts":"2020-08-08T11:45:46.481Z","msg":"Get dynamic config","name":"system.enableReadVisibilityFromES","value":"true","default-value":"true","logging-call-at":"config.go:78"}
{"level":"info","ts":"2020-08-08T11:45:46.481Z","msg":"Get dynamic config","name":"frontend.esIndexMaxResultWindow","value":"10000","default-value":"10000","logging-call-at":"config.go:78"}
{"level":"error","ts":"2020-08-08T11:45:46.562Z","msg":"Operation failed with internal error.","service":"frontend","metric-scope":236,"error":"ListOpenWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]","logging-call-at":"esVisibilityMetricClients.go:276","stacktrace":"go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/common/persistence/elasticsearch.(*visibilityMetricsClient).updateErrorMetric\n\t/temporal/common/persistence/elasticsearch/esVisibilityMetricClients.go:276\ngo.temporal.io/server/common/persistence/elasticsearch.(*visibilityMetricsClient).ListOpenWorkflowExecutions\n\t/temporal/common/persistence/elasticsearch/esVisibilityMetricClients.go:107\ngo.temporal.io/server/common/persistence.(*visibilityManagerWrapper).ListOpenWorkflowExecutions\n\t/temporal/common/persistence/visibilityWrapper.go:114\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowHandler.go:2407\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:367\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:116\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/dcRedirectionHandler.go:363\ngo.temporal.io/server/service/frontend.(*AccessControlledWorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/accessControlledHandler.go:329\ngo.temporal.io/server/service/frontend.(*WorkflowNilCheckHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowNilCheckHandler.go:344\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v0.28.0/workflowservice/v1/service.pb.go:1394\ngo.temporal.io/server/service/frontend.interceptor\n\t/temporal/service/frontend/service.go:320\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler\n\t/go/pkg/mod/go.temporal.io/api@v0.28.0/workflowservice/v1/service.pb.go:1396\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:1180\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:1503\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:843"}
{"level":"error","ts":"2020-08-08T11:45:46.563Z","msg":"Internal service error","service":"frontend","error":"ListOpenWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed `[type=search_phase_execution_exception]","logging-call-at":"workflowHandler.go:3379","stacktrace":"go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).error\n\t/temporal/service/frontend/workflowHandler.go:3379\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowHandler.go:2411\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:367\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:116\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/dcRedirectionHandler.go:363\ngo.temporal.io/server/service/frontend.(*AccessControlledWorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/accessControlledHandler.go:329\ngo.temporal.io/server/service/frontend.(*WorkflowNilCheckHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowNilCheckHandler.go:344\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v0.28.0/workflowservice/v1/service.pb.go:1394\ngo.temporal.io/server/service/frontend.interceptor\n\t/temporal/service/frontend/service.go:320\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler\n\t/go/pkg/mod/go.temporal.io/api@v0.28.0/workflowservice/v1/service.pb.go:1396\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:1180\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:1503\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:843"}`

Any idea what’s going wrong ,
should i also set elasticsearch.persistence.enabled: to true ( its false by default as ES is configured only for advanced visibility)

1 Like

Hi Madhu, thank you for the report!

Let me see if I can repro the problem. What is the helm install command line that you used to deploy? Also, what is the diff between what’s in https://github.com/temporalio/helm-charts and the configuration you tried? (helm-charts $ git diff > diff.diff)

Thank you,
Mark.

hi mark thanks for taking a look, there is no difference, the es helm (the zip file which comes as part of helm update) has min replica as 2, i just made it one, as my minikube does not have too much memory/cpu :slight_smile:

so the diff is

enabled: true
replicas: 1
minimumMasterNodes:

Hey Madhu,

Thank you for the follow up!

You are absolutely correct, ElasticSearch’s Helm Chart does not allow for one-replica configurations. You will need a bigger cluster, or you can install a small footprint version of “Tiny Batteries Included” Temporal, without Elastic Search. I updated the readme file to include the command line that you can use for this:

Thank you,
Mark.

1 Like

Hi Mark,

I am also getting same error. We have our own Elasticsearch with 3 replica running. Could you please help?

NTERNAL: ListClosedWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]

This might be ES version mismatch. What is the version of ES?

Thanks Alex. I am using 7.10.1 version. Should I use * ElasticSearch v6.8 & v7.7 which is mentioned here ?

Below is the full error snap

    { "logging-call-at": "workflowHandler.go:3362", "stacktrace": "go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error
	/temporal/common/log/loggerimpl/logger.go:138
go.temporal.io/server/service/frontend.(*WorkflowHandler).error
	/temporal/service/frontend/workflowHandler.go:3362
go.temporal.io/server/service/frontend.(*WorkflowHandler).ListClosedWorkflowExecutions
	/temporal/service/frontend/workflowHandler.go:2504
go.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListClosedWorkflowExecutions.func2
	/temporal/service/frontend/dcRedirectionHandler.go:337
go.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect
	/temporal/service/frontend/dcRedirectionPolicy.go:116
go.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListClosedWorkflowExecutions
	/temporal/service/frontend/dcRedirectionHandler.go:333
go.temporal.io/api/workflowservice/v1._WorkflowService_ListClosedWorkflowExecutions_Handler.func1
	/go/pkg/mod/go.temporal.io/api@v1.2.0/workflowservice/v1/service.pb.go:1407
go.temporal.io/server/common/authorization.(*interceptor).Interceptor
	/temporal/common/authorization/interceptor.go:87
google.golang.org/grpc.getChainUnaryHandler.func1
	/go/pkg/mod/google.golang.org/grpc@v1.33.2/server.go:1051
go.temporal.io/server/common/rpc.ServiceErrorInterceptor
	/temporal/common/rpc/grpc.go:100
google.golang.org/grpc.chainUnaryServerInterceptors.func1
	/go/pkg/mod/google.golang.org/grpc@v1.33.2/server.go:1037
go.temporal.io/api/workflowservice/v1._WorkflowService_ListClosedWorkflowExecutions_Handler
	/go/pkg/mod/go.temporal.io/api@v1.2.0/workflowservice/v1/service.pb.go:1409
google.golang.org/grpc.(*Server).processUnaryRPC
	/go/pkg/mod/google.golang.org/grpc@v1.33.2/server.go:1210
google.golang.org/grpc.(*Server).handleStream
	/go/pkg/mod/google.golang.org/grpc@v1.33.2/server.go:1533
google.golang.org/grpc.(*Server).serveStreams.func1.2
	/go/pkg/mod/google.golang.org/grpc@v1.33.2/server.go:871", "ts": "2021-02-05T05:19:05.745Z", "level": "error", "error": "ListClosedWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]", "msg": "Internal service error", "service": "frontend" }

Sorry to revive this topic. I’m using ES 7.10 and am also receiving this error from Temporal. The index is searchable in Kibana and through the API though.