Context deadline exceeded issue

Mohit_Sharma · July 12, 2022, 6:46pm

Hi Team,
I am seeing context deadline exceeded error while writing the message to the queue (while scheduling the workflow). Can someone suggest any solution to resolve this issue ?
Any help is appreciated .

Thanks,
Mohit

tihomir · July 12, 2022, 7:10pm

Hi @Mohit_Sharma can you provide more info please?
Full error log and a detailed description of what client code you are running would help.

Mohit_Sharma · July 12, 2022, 11:18pm

	logger.BootstrapLogger.Debug("Entering initTemporalStarter...")
	// Create the client object just once per process
	opts := client.Options{HostPort: temporalHostPort, Namespace: temporalNameSpace}
	// Create the client object just once per process
	c, err := client.NewClient(opts)
	if err != nil {
		return nil, fmt.Errorf("error while creating the temporal client:%v", err)
	}
	return c, nil
}```


Above one is the client connection code.

options := client.StartWorkflowOptions{
ID: schedule.ID,
TaskQueue: worker.LIVETaskQueue,
}
_, err := s.Client.ExecuteWorkflow(context.Background(), options, “CreateWorkflow”, schedule)
if err != nil {
return fmt.Errorf(“unable to complete Workflow:%v”, err)
}```

this the workflow schedule code.

time="2022-07-11 16:43:30" level=debug msg="Entering handler.buildGetSchemaFailureRespBody() ...map[X-Tracking-Id:fc0c5b69-8324-40be-960d-ee2094070499]"```

this is the app log and the framework log is 
```Workflow:context deadline exceeded```

tihomir · July 12, 2022, 11:26pm

Thanks for the info, can you check if your temporal frontend service is up ?

tctl --ad <temporal-frontend-address:port> cl h

Does this return “Serving”? Use the same for temporal-frontend-address:port as you set in code, re: HostPort: temporalHostPort.

Mohit_Sharma · July 12, 2022, 11:52pm

bash-5.1# tctl --ad 10.119.240.10:7233 cl h
temporal.api.workflowservice.v1.WorkflowService: SERVING

yes it is returning SERVING

tihomir · July 13, 2022, 12:32am

Thanks. Would you be able to provide the whole error log?
Are you able to start your workflow execution via tctl for example:

tctl ---ad <temporal-frontend-address:port> --namespace <namespace> wf start --tq <taskqueue> --wt CreateWorkflow

what is the “schedule” argument you are passing as input arg?

Can you check your temporal service logs to see if anything stands out? These context deadline exceeded errors do need some debugging/are not easy to figure out at times.

Mohit_Sharma · July 13, 2022, 3:54pm

bash-5.1# tctl --ad 10.119.240.10:7233 --namespace live-ctrl-svc wf start --tq LIVE_TASK_QUEUE --wt CreateWorkflow
Error: Failed to create workflow.
Error Details: context deadline exceeded
Stack trace:
goroutine 1 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x19
go.temporal.io/server/tools/cli.printError({0x1dbb50b, 0x1a}, {0x20dbf20, 0xc0000ac120})```

Mohit_Sharma · July 13, 2022, 8:32pm

@tihomir Can you please have a look ?

tihomir · July 13, 2022, 8:47pm

Context deadline exceeded errors can happen for a number of reasons, for example network issues, config issues, services being down, db issues etc, so would need more info please:

How are you deploying temporal server (docker compose, helm charts, some other way)?
Do you see any logs reported by either temporal frontend, history, matching services. Do you see any logs reported by your db?
Do you have a load balancer in front of your temporal frontend service?
Can you access the web ui? If you can does it show any errors?

Any other info that you could share that you think would be important? Were you able to start workflows before or is this a fresh cluster install?

kirubaap · August 22, 2022, 1:34pm

We are still getting the context deadline exceeded error while triggering a new workflow.

We are deploying the temporal server with helm commands.

We do have a working system in another cluster with following versions.

Azure (Old and not been upgraded but Working without any issues) :
Temporal version :: 1.8.1
Kubernetes version :: v1.20.13

GCP (Working)
Temporal version :: 1.14
Kubernetes version :: v1.22.10-gke.600

GCP (Not Working) Recently created the cluster
Temporal version :: 1.16.2
Kubernetes version :: v1.22.9-gke.1500

Our Goversion was 1.12 and we updated the code to 1.16 and temporal sdk to 1.16.0 … even after that we are having same issue .

I have also attached the history/frontend pod logs or your Reference in the Google Drive
Temporal - Google Drive

Please go through the logs and let us know to mitigate the issue.

tihomir · August 22, 2022, 3:45pm

That’s a lot of logs to go through, I think typically you would go through your logs and indicate / show the errors
Looked your history logs briefly and errors like:

error.",“error”:“GetVisibilityTasks operation failed. Select failed.
error.”,“error”:“UpdateShard failed. Failed to start transaction.
error.”,“error”:"GetOrCreateShard: failed to get ShardID 177

seem to indicate possible db issues (can also be network issues, might be worth checking this on your end). can you look at your db logs? what persistence are you using? what is numHistoryShards that you set in your config? Under what load do you get these errors, or is it only during pod restart?

Do you have server metrics enabled? Couple things worth checking:

Persistence latencies:
histogram_quantile(0.95, sum(rate(persistence_latency_bucket{}[1m])) by (operation, le))

Visibility latencies:
histogram_quantile(0.95, sum(rate(task_latency_bucket{operation=~“VisibilityTask.*”, service_name=“history”}[1m])) by (operation, le))

Resources Exhausted:
sum(rate(service_errors_resource_exhausted{}[1m])) by (resource_exhausted_cause)

Hope this helps.

kirubaap · August 23, 2022, 2:28pm

While Debugging the issue in local with cluster temporal , I am getting the context deadline exceeded @ c.cc.Invoke() .

Any idea , why its failing at this level?

func (c *workflowServiceClient) StartWorkflowExecution(ctx context.Context, in *StartWorkflowExecutionRequest, opts ...grpc.CallOption) (*StartWorkflowExecutionResponse, error) {
	out := new(StartWorkflowExecutionResponse)
	**err := c.cc.Invoke(ctx, "/temporal.api.workflowservice.v1.WorkflowService/StartWorkflowExecution", in, out, opts...)**
	if err != nil {
		return nil, err
	}
	return out, nil
}

kirubaap · August 23, 2022, 3:18pm

@maxim @tihomir @alex

Can you please help on this?

kirubaap · August 26, 2022, 7:17am

@tihomir , The issue is resolved now.

In our case, we have SQL instance is ASIA region and temporal running on USW2 region. We had VPC connection and peering enabled and the SQL IP was able to connect from temporal pod.

Though the workflow was failing consistently with Context deadline exceeded.

Later , we deployed new SQL instance in the same region in USW2 and restarted it.The workflow was able to create and execute without any issues.

keira · September 8, 2022, 9:54am

Also facing this issue. error.”,“error”:"GetOrCreateShard: failed to get ShardID 177
And the numHistoryShards is 512 in my config.

taras · November 20, 2024, 5:15pm

I faced a similar issue on starting the worker, and super randomly (it worked before) I found that changing the host fixed the issue

➜  core git:(main) ✗ temporal operator cluster health --address localhost:7233

time=2024-11-20T18:12:18.103 level=ERROR msg="failed reaching server: context deadline exceeded"

then using 127.0.0.1:

➜  core git:(main) ✗ temporal operator cluster health --address 127.0.0.1:7233

SERVING

maybe this helps (mine was suuuuper dumb issue )

Topic		Replies	Views
Unable to submit workflow with "context deadline exceeded" error Community Support go-sdk , general-impl	0	78	August 21, 2024
Temporal error: context deadline exceeded Community Support go-sdk , server	1	1261	December 15, 2023
Context deadline exceeded while initiating workflows Community Support java-sdk , cassandra	3	1058	November 29, 2023
DEADLINE_EXCEEDED: deadline exceeded after 9.999933037s Community Support java-sdk	9	2446	July 13, 2023
Context Deadline Exceeded error while executing workflow Community Support go-sdk	1	519	April 25, 2024

Context deadline exceeded issue

Related topics