Cannot terminate workflow and all workflow stuck

keira · December 1, 2022, 12:17pm

{“level”:“error”,“ts”:“2022-12-01T12:13:59.787Z”,“msg”:“Operation failed with internal error.”,“error”:“AppendHistoryNodes: mssql: Could not allocate space for object ‘Temporal.history_node’.‘PK__history___DE8D8FB47C38DD1B’ in database ‘xxxx’ because the ‘PRIMARY’ filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.”,“metric-scope”:7,“logging-call-at”:“persistenceMetricClients.go:1424”,“stacktrace”:"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/tmp/go/src/workspace/icg-msst-salespipeline-175611/icg-msst-salespipeline-175611-temporal-service.master/common/log/zap_logger.go

cannot terminate workflow and all workflow stuck, DEADLINE_EXCEEDED in poller thread Workflow Poller this error in client

antonio.perez · December 1, 2022, 12:30pm

Hello @keira

It looks like an error related to the DB, I have found some links that seem related to the problem you are facing.

How to solve SQL error “The primary filegroup is full”? | Web Hosting Forum - Review - Community & Resources
Resolving SQL Server errors: The Primary Filegroup is full

Let me know if it helps,

keira · December 2, 2022, 6:48am

Thanks. And workflow request is return 504 and all workflow is stuck.

antonio.perez · December 2, 2022, 10:04am

Hello @keira

is this happening after you fix the database issue?

Paco · January 19, 2023, 9:26pm

Hi, I would like to piggy back on this issue.
Running into DEADLINE_EXCEEDED when trying to terminate workflows, after spinning up a flood of them accidentally. The UI doesn’t load the details for each of those workflows. CLI throws DEADLINE_EXCEEDED error like the following

tctl --namespace core wf term -w <wf_id>
Error: Terminate workflow failed.
Error Details: context deadline exceeded
Stack trace:
goroutine 1 [running]:
runtime/debug.Stack(0xd, 0x0, 0x0)
/usr/local/go/src/runtime/debug/stack.go:24 +0x9f
runtime/debug.PrintStack()
/usr/local/go/src/runtime/debug/stack.go:16 +0x25
go.temporal.io/server/tools/cli.printError(0x1fb2ec0, 0x1a, 0x22857e0, 0xc00000c258)
/temporal/tools/cli/util.go:394 +0x2be
go.temporal.io/server/tools/cli.ErrorAndExit(0x1fb2ec0, 0x1a, 0x22857e0, 0xc00000c258)
/temporal/tools/cli/util.go:405 +0x49
go.temporal.io/server/tools/cli.TerminateWorkflow(0xc0007c6580)
/temporal/tools/cli/workflowCommands.go:477 +0x278
go.temporal.io/server/tools/cli.newWorkflowCommands.func7(0xc0007c6580)
/temporal/tools/cli/workflow.go:126 +0x2b
github.com/urfave/cli.HandleAction(0x1bdacc0, 0x203a6b0, 0xc0007c6580, 0xc0007c6580, 0x0)
/go/pkg/mod/github.com/urfave/cli@v1.22.5/app.go:526 +0x59
github.com/urfave/cli.Command.Run(0x1f90d21, 0x9, 0x0, 0x0, 0xc00072c9a0, 0x1, 0x1, 0x1fc8bb3, 0x22, 0x0, …)
/go/pkg/mod/github.com/urfave/cli@v1.22.5/command.go:173 +0x579
github.com/urfave/cli.(*App).RunAsSubcommand(0xc000413500, 0xc0007c62c0, 0x0, 0x0)
/go/pkg/mod/github.com/urfave/cli@v1.22.5/app.go:405 +0x914
github.com/urfave/cli.Command.startApp(0x1f8f0e7, 0x8, 0x0, 0x0, 0xc00072cd40, 0x1, 0x1, 0x1fb0813, 0x19, 0x0, …)
/go/pkg/mod/github.com/urfave/cli@v1.22.5/command.go:372 +0x7ff
github.com/urfave/cli.Command.Run(0x1f8f0e7, 0x8, 0x0, 0x0, 0xc00072cd40, 0x1, 0x1, 0x1fb0813, 0x19, 0x0, …)
/go/pkg/mod/github.com/urfave/cli@v1.22.5/command.go:102 +0x9d4
github.com/urfave/cli.(*App).Run(0xc000413180, 0xc00003a070, 0x7, 0x7, 0x0, 0x0)
/go/pkg/mod/github.com/urfave/cli@v1.22.5/app.go:277 +0x808
main.main()
/temporal/cmd/tools/cli/main.go:37 +0x4e

I have reviewed Troubleshooting Issues with the TypeScript SDK | Legacy documentation for Temporal SDKs
Are there any emergency ways to terminate workflows?
Anything I can do to prevent that in the future?

tihomir · January 19, 2023, 10:30pm

Can you see this exec in primary persistence?

tctl wf desc -w <wfid>

context deadline exceeded
Check cluster stability, maybe start off with health checks

grpc-health-probe -addr=localhost:7233 -service=temporal.api.workflowservice.v1.WorkflowService
grpc-health-probe -addr=localhost:7235 -service=temporal.api.workflowservice.v1.MatchingService
grpc-health-probe -addr=localhost:7234 -service=temporal.api.workflowservice.v1.HistoryService

(change localhost to appropriate ips/hostname)

Paco · January 20, 2023, 8:16am

Thanks, I checked health only with tctl cluster health command and reviewed metrics in grafana. Nothing was looking unhealthy. I have restarted all temporal pods to see if it makes any difference. It did not. We ended up dropping workflows at the db, which did the trick, but obviously that shouldn’t be the way to do it.

Topic		Replies	Views
DEADLINE_EXCEEDED: deadline exceeded after 9.999933037s Community Support java-sdk	9	2440	July 13, 2023
Occasionally workflow task won't be started after scheduled Community Support	16	620	November 9, 2022
Workflow stuck in limbo state Community Support	6	210	March 6, 2025
Internal service error Community Support	7	923	February 25, 2021
Context deadline exceeded issue Community Support go-sdk	15	7259	November 20, 2024

Cannot terminate workflow and all workflow stuck

Related topics