visibilityMetricsClient exceptions in temporal-frontend pod

Hi I brought up the default k8s setup with slightly altered es setting
elasticsearch:
enabled: true
replicas: 1
minimumMasterNodes:

in my front end logs i see this message

"level":"info","ts":"2020-08-08T10:33:50.120Z","msg":"Get dynamic config","name":"frontend.historyMaxPageSize","value":"1000","default-value":"1000","logging-call-at":"config.go:78"}
{"level":"warn","ts":"2020-08-08T11:45:46.205Z","msg":"error in prometheus reporter","error":"a previously registered descriptor with the same fully-qualified name as Desc{fqName: \"service_authorization_latency\", help: \"service_authorization_latency histogram\", constLabels: {}, variableLabels: [type operation]} has different label names or a different help string","logging-call-at":"metrics.go:135"}
{"level":"info","ts":"2020-08-08T11:45:46.481Z","msg":"Get dynamic config","name":"system.enableReadVisibilityFromES","value":"true","default-value":"true","logging-call-at":"config.go:78"}
{"level":"info","ts":"2020-08-08T11:45:46.481Z","msg":"Get dynamic config","name":"frontend.esIndexMaxResultWindow","value":"10000","default-value":"10000","logging-call-at":"config.go:78"}
{"level":"error","ts":"2020-08-08T11:45:46.562Z","msg":"Operation failed with internal error.","service":"frontend","metric-scope":236,"error":"ListOpenWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]","logging-call-at":"esVisibilityMetricClients.go:276","stacktrace":"go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/common/persistence/elasticsearch.(*visibilityMetricsClient).updateErrorMetric\n\t/temporal/common/persistence/elasticsearch/esVisibilityMetricClients.go:276\ngo.temporal.io/server/common/persistence/elasticsearch.(*visibilityMetricsClient).ListOpenWorkflowExecutions\n\t/temporal/common/persistence/elasticsearch/esVisibilityMetricClients.go:107\ngo.temporal.io/server/common/persistence.(*visibilityManagerWrapper).ListOpenWorkflowExecutions\n\t/temporal/common/persistence/visibilityWrapper.go:114\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowHandler.go:2407\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:367\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:116\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/dcRedirectionHandler.go:363\ngo.temporal.io/server/service/frontend.(*AccessControlledWorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/accessControlledHandler.go:329\ngo.temporal.io/server/service/frontend.(*WorkflowNilCheckHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowNilCheckHandler.go:344\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v0.28.0/workflowservice/v1/service.pb.go:1394\ngo.temporal.io/server/service/frontend.interceptor\n\t/temporal/service/frontend/service.go:320\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler\n\t/go/pkg/mod/go.temporal.io/api@v0.28.0/workflowservice/v1/service.pb.go:1396\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:1180\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:1503\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:843"}
{"level":"error","ts":"2020-08-08T11:45:46.563Z","msg":"Internal service error","service":"frontend","error":"ListOpenWorkflowExecutions failed. Error: elastic: Error 400 (Bad Request): all shards failed `[type=search_phase_execution_exception]","logging-call-at":"workflowHandler.go:3379","stacktrace":"go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Error\n\t/temporal/common/log/loggerimpl/logger.go:138\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).error\n\t/temporal/service/frontend/workflowHandler.go:3379\ngo.temporal.io/server/service/frontend.(*WorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowHandler.go:2411\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions.func2\n\t/temporal/service/frontend/dcRedirectionHandler.go:367\ngo.temporal.io/server/service/frontend.(*NoopRedirectionPolicy).WithNamespaceRedirect\n\t/temporal/service/frontend/dcRedirectionPolicy.go:116\ngo.temporal.io/server/service/frontend.(*DCRedirectionHandlerImpl).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/dcRedirectionHandler.go:363\ngo.temporal.io/server/service/frontend.(*AccessControlledWorkflowHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/accessControlledHandler.go:329\ngo.temporal.io/server/service/frontend.(*WorkflowNilCheckHandler).ListOpenWorkflowExecutions\n\t/temporal/service/frontend/workflowNilCheckHandler.go:344\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler.func1\n\t/go/pkg/mod/go.temporal.io/api@v0.28.0/workflowservice/v1/service.pb.go:1394\ngo.temporal.io/server/service/frontend.interceptor\n\t/temporal/service/frontend/service.go:320\ngo.temporal.io/api/workflowservice/v1._WorkflowService_ListOpenWorkflowExecutions_Handler\n\t/go/pkg/mod/go.temporal.io/api@v0.28.0/workflowservice/v1/service.pb.go:1396\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:1180\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:1503\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.31.0/server.go:843"}`

Any idea what’s going wrong ,
should i also set elasticsearch.persistence.enabled: to true ( its false by default as ES is configured only for advanced visibility)

1 Like

Hi Madhu, thank you for the report!

Let me see if I can repro the problem. What is the helm install command line that you used to deploy? Also, what is the diff between what’s in https://github.com/temporalio/helm-charts and the configuration you tried? (helm-charts $ git diff > diff.diff)

Thank you,
Mark.

hi mark thanks for taking a look, there is no difference, the es helm (the zip file which comes as part of helm update) has min replica as 2, i just made it one, as my minikube does not have too much memory/cpu :slight_smile:

so the diff is

enabled: true
replicas: 1
minimumMasterNodes:

Hey Madhu,

Thank you for the follow up!

You are absolutely correct, ElasticSearch’s Helm Chart does not allow for one-replica configurations. You will need a bigger cluster, or you can install a small footprint version of “Tiny Batteries Included” Temporal, without Elastic Search. I updated the readme file to include the command line that you can use for this:

Thank you,
Mark.

1 Like