Gocql: no hosts available in the pool

All of the sudden our cadence server stopped connecting to the Cassandra db. We checked the cadence logs and we got the following message:

gocql: no hosts available in the pool

Everything went back to normality once we restarted the cadence server. The cassandra DB seemed to have been working normally during this event, Right now we are kind of clueless to know what occurred and how we can prevent it in the future. We have the suspicion that probably had to do with some connectivity issue, but nothing sure.

Does this sound familiar to some of you? Is there something we can do to keep this from happening in the future?

This has something to do with gocql & Cassandra

NOTE:

  1. cadence 0.15.0 is using an really old version of gocql:
    https://github.com/uber/cadence/blob/v0.15.0/go.mod#L25

  2. later, 0.15.1 is using a newer version, maybe you want to take a try
    https://github.com/uber/cadence/blob/v0.15.1/go.mod#L24

My cluster was working fine for a month. Just returned from vacation and now I am not able to connect to cluster due to the same error. Below is the stack trace.

tctl --namespace poc namespace desc

Error: Operation DescribeNamespace failed.

Error Details: rpc error: code = Unavailable desc = GetNamespace operation failed. Error gocql: no hosts available in the pool

Stack trace:

goroutine 1 [running]:

runtime/debug.Stack()

/usr/local/go/src/runtime/debug/stack.go:24 +0x65

runtime/debug.PrintStack()

/usr/local/go/src/runtime/debug/stack.go:16 +0x19

go.temporal.io/server/tools/cli.printError({0x1dc7e50, 0x23}, {0x20b84c0, 0xc000146298})

/temporal/tools/cli/util.go:392 +0x22a

go.temporal.io/server/tools/cli.ErrorAndExit({0x1dc7e50, 0x20e9958}, {0x20b84c0, 0xc000146298})

/temporal/tools/cli/util.go:403 +0x28

go.temporal.io/server/tools/cli.(*namespaceCLIImpl).DescribeNamespace(0x0, 0xc000487600)

/temporal/tools/cli/namespaceCommands.go:313 +0x205

go.temporal.io/server/tools/cli.newNamespaceCommands.func3(0xc000487600)

/temporal/tools/cli/namespace.go:95 +0x2f

github.com/urfave/cli.HandleAction({0x19bbea0, 0x1e3f3e8}, 0x8)

/go/pkg/mod/github.com/urfave/cli@v1.22.5/app.go:526 +0x50

github.com/urfave/cli.Command.Run({{0x1d8a187, 0x8}, {0x0, 0x0}, {0xc000515390, 0x1, 0x1}, {0x1dca9ee, 0x24}, {0x0, …}, …}, …)

/go/pkg/mod/github.com/urfave/cli@v1.22.5/command.go:173 +0x652

github.com/urfave/cli.(*App).RunAsSubcommand(0xc000552380, 0xc000487340)

/go/pkg/mod/github.com/urfave/cli@v1.22.5/app.go:405 +0x9ec

github.com/urfave/cli.Command.startApp({{0x1d8c3d4, 0x9}, {0x0, 0x0}, {0xc000515790, 0x1, 0x1}, {0x1dafdd8, 0x1a}, {0x0, …}, …}, …)

/go/pkg/mod/github.com/urfave/cli@v1.22.5/command.go:372 +0x6e9

github.com/urfave/cli.Command.Run({{0x1d8c3d4, 0x9}, {0x0, 0x0}, {0xc000515790, 0x1, 0x1}, {0x1dafdd8, 0x1a}, {0x0, …}, …}, …)

/go/pkg/mod/github.com/urfave/cli@v1.22.5/command.go:102 +0x808

github.com/urfave/cli.(*App).Run(0xc000552000, {0xc00003a0a0, 0x5, 0x5})

/go/pkg/mod/github.com/urfave/cli@v1.22.5/app.go:277 +0x705

main.main()

/temporal/cmd/tools/cli/main.go:37 +0x33

Any solution for above issue ? Facing below error, when Frontend POD is port forwarded.

{“statusCode”:503,“statusText”:“Service Unavailable”,“response”:{},“message”:“operation GetClusterMetadata encountered gocql: no hosts available in the pool”}

Would check if this is a transient error (goes away, does not persist). This error can be transient and gocql wrapper would create a new sessions after it happens.
Also what versions of Temporal server and Cassandra are you using?

I’m running Temporal 1.18.5 and have run into exactly this issue

“operation UpsertClusterMembership encountered gocql: no hosts available in the pool

I also posted a comment on a Github issue which feels related.