UNAVAILABLE: upstream connect error or disconnect/reset before headers. reset reason: overflow

Hi,

Temporal-Server: v1.18.3
Temporal ui-server: v2.8.0

I am trying to click on Query Tab for the workflow doesn’t show up the custom search attributes in the drop down(worker is running and custom search attributes already exists).

Here is the error i am getting.

{
  "message": "UNAVAILABLE: upstream connect error or disconnect/reset before headers. reset reason: overflow",
  "source": "JavaSDK",
  "stackTrace": "io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)\nio.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)\nio.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)\nio.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.getWorkflowExecutionHistory(WorkflowServiceGrpc.java:3726)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.createStatefulHandler(ReplayWorkflowTaskHandler.java:366)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.lambda$getOrCreateWorkflowExecutor$0(ReplayWorkflowTaskHandler.java:341)\nio.temporal.internal.worker.WorkflowExecutorCache.getOrCreate(WorkflowExecutorCache.java:109)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.getOrCreateWorkflowExecutor(ReplayWorkflowTaskHandler.java:336)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:109)\nio.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:96)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:407)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:317)\nio.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:259)\nio.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105)\njava.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\njava.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\njava.base/java.lang.Thread.run(Thread.java:834)\n",
  "cause": null,
  "applicationFailureInfo": {
    "type": "io.grpc.StatusRuntimeException",
    "nonRetryable": false,
    "details": null
  }
}

Trying search attributes from tctl same result.

$ tctl --ad server-asyncworkflow-uat.jpmchase.net:443 --tls_ca_path /h/cert.pem admin cluster gsa
?[31mError:?[0m Unable to get search attributes.
?[35mError Details:?[0m **rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: overflow**
('export TEMPORAL_CLI_SHOW_STACKS=1' to see stack traces)

What could be causing the issue?

I have equal 8 pods in each service.
if the frontend pod size > 2 getting the above error, lesser than no issue.

Do you have load balancing in front of your frontend hosts?

You could health check each frontend individually with grpc-health-probe:
./grpc-health-probe -addr=frontendHost:frontendGrpcPort -service=temporal.api.workflowservice.v1.WorkflowService

Also take a look at your ring info with:
tctl adm cl d

see how many frontend hosts have joined.

Yes with Load Balancing url.


very less times shows SERVING.


majority of time shows overflow.

$ ./tctl --ad server-asyncworkflow-uat.jpmchase.net:443 --ns 109727_noha_jpm --tls_ca_path /h/jpmc_certs/JPMCROOTCA.pem admin cluster d
?[31mError:?[0m Operation DescribeCluster failed.
?[35mError Details:?[0m rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: overflow
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)

N585911@chcdh41c13n1 /i/Code/asyncworkflow/tctl/async-workflow-temporal-tctl_develop/bin/windows-amd64 (develop)
$ ./tctl --ad server-asyncworkflow-uat.jpmchase.net:443 --ns 109727_noha_jpm --tls_ca_path /h/jpmc_certs/JPMCROOTCA.pem admin cluster d
?[31mError:?[0m Operation DescribeCluster failed.
?[35mError Details:?[0m rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: overflow
(‘export TEMPORAL_CLI_SHOW_STACKS=1’ to see stack traces)

N585911@chcdh41c13n1 /i/Code/asyncworkflow/tctl/async-workflow-temporal-tctl_develop/bin/windows-amd64 (develop)
$ ./tctl --ad server-asyncworkflow-uat.jpmchase.net:443 --ns 109727_noha_jpm --tls_ca_path /h/jpmc_certs/JPMCROOTCA.pem admin cluster d
{
“supportedClients”: {
“temporal-cli”: “\u003c2.0.0”,
“temporal-go”: “\u003c2.0.0”,
“temporal-java”: “\u003c2.0.0”,
“temporal-php”: “\u003c2.0.0”,
“temporal-server”: “\u003c2.0.0”,
“temporal-typescript”: “\u003c2.0.0”,
“temporal-ui”: “\u003c3.0.0”
},
“serverVersion”: “1.18.3”,
“membershipInfo”: {
“currentHost”: {
“identity”: “100.127.28.209:7233”
},
“reachableMembers”: [
“100.127.61.156:6933”,
“100.127.60.53:6935”,
“100.127.62.50:6933”,
“100.127.28.210:6935”,
“100.127.77.55:6934”,
“100.127.22.71:6934”,
“100.127.105.220:6933”,
“100.127.62.51:6939”,
“100.127.22.77:6939”,
“100.127.105.222:6939”,
“100.127.46.87:6935”,
“100.127.50.241:6939”,
“100.127.62.49:6935”,
“100.127.28.209:6933”,
“100.127.61.157:6934”,
“100.127.46.88:6939”,
“100.127.19.176:6935”,
“100.127.73.184:6935”,
“100.127.22.69:6935”,
“100.127.60.52:6933”,
“100.127.19.177:6934”,
“100.127.73.183:6934”,
“100.127.28.211:6939”,
“100.127.57.93:6934”,
“100.127.77.52:6933”,
“100.127.57.96:6935”,
“100.127.1.228:6934”,
“100.127.19.187:6933”,
“100.127.28.208:6934”,
“100.127.73.182:6933”,
“100.127.1.229:6939”,
“100.127.77.54:6939”,
“100.127.62.48:6934”,
“100.127.50.240:6933”,
“100.127.61.159:6935”,
“100.127.50.239:6935”,
“100.127.1.233:6933”,
“100.127.57.91:6939”,
“100.127.105.221:6934”,
“100.127.60.54:6939”
],
“rings”: [
{
“role”: “frontend”,
“memberCount”: 10,
“members”: [
{
“identity”: “100.127.1.233:7233”
},
{
“identity”: “100.127.50.240:7233”
},
{
“identity”: “100.127.19.187:7233”
},
{
“identity”: “100.127.77.52:7233”
},
{
“identity”: “100.127.28.209:7233”
},
{
“identity”: “100.127.105.220:7233”
},
{
“identity”: “100.127.73.182:7233”
},
{
“identity”: “100.127.61.156:7233”
},
{
“identity”: “100.127.60.52:7233”
},
{
“identity”: “100.127.62.50:7233”
}
]
},
{
“role”: “history”,
“memberCount”: 10,
“members”: [
{
“identity”: “100.127.62.48:7234”
},
{
“identity”: “100.127.57.93:7234”
},
{
“identity”: “100.127.73.183:7234”
},
{
“identity”: “100.127.61.157:7234”
},
{
“identity”: “100.127.1.228:7234”
},
{
“identity”: “100.127.19.177:7234”
},
{
“identity”: “100.127.28.208:7234”
},
{
“identity”: “100.127.77.55:7234”
},
{
“identity”: “100.127.105.221:7234”
},
{
“identity”: “100.127.22.71:7234”
}
]
},
{
“role”: “matching”,
“memberCount”: 10,
“members”: [
{
“identity”: “100.127.61.159:7235”
},
{
“identity”: “100.127.62.49:7235”
},
{
“identity”: “100.127.57.96:7235”
},
{
“identity”: “100.127.22.69:7235”
},
{
“identity”: “100.127.28.210:7235”
},
{
“identity”: “100.127.19.176:7235”
},
{
“identity”: “100.127.60.53:7235”
},
{
“identity”: “100.127.73.184:7235”
},
{
“identity”: “100.127.46.87:7235”
},
{
“identity”: “100.127.50.239:7235”
}
]
},
{
“role”: “worker”,
“memberCount”: 10,
“members”: [
{
“identity”: “100.127.60.54:7239”
},
{
“identity”: “100.127.1.229:7239”
},
{
“identity”: “100.127.22.77:7239”
},
{
“identity”: “100.127.105.222:7239”
},
{
“identity”: “100.127.28.211:7239”
},
{
“identity”: “100.127.50.241:7239”
},
{
“identity”: “100.127.62.51:7239”
},
{
“identity”: “100.127.57.91:7239”
},
{
“identity”: “100.127.77.54:7239”
},
{
“identity”: “100.127.46.88:7239”
}
]
}
]
},
“clusterId”: “acdbae8b-91b0-43d6-b2cd-804271efcd62”,
“clusterName”: “active”,
“historyShardCount”: 512,
“persistenceStore”: “cassandra”,
“visibilityStore”: “cassandra”,
“failoverVersionIncrement”: “10”,
“initialFailoverVersion”: “1”,
“isGlobalNamespaceEnabled”: true
}