Postgres connection churn

datastores:
default:
sql: &postgresql_config
pluginName: postgres
databaseName: temporal
connectAddr: postgres.example.com:5432
connectProtocol: tcp
maxConns: 20
maxIdleConns: 20
maxConnLifetime: “30m”

We are using the following properties on our 8 history clusters
and when we are running above 50 workflows per second we are noticing

A lot of connections are getting closed very frequently some less than 2 seconds.

and we believe in making those new connections, it’s taking some time and not able to run at it’s max capacity

is there any property called “keepAlive” we can use ?
or any other better suggestion from the community ?

@maxim @tihomir We’re seeing this as well causing problems with cloud-sql=proxy IAM integration when we run more than 3 history nodes. How do we configure Temporal to keep connections alive longer and start new connections less often?

I’ve added a screenshot where you can see that Temporal is opening a whopping 250 new connections per second which I assume is unintentional.

@Graham_Pymm could you maybe get us in touch with somebody who could look at this and help us fix this problem?

We are observing similar pattern as well, one of the solution is managing connection pool outside of temporal using proxy or pgbouncer ( we haven’t taken a decision thought).

is there a better/efficient solution to this ?, Also any insights on why temporal would spawn so many connections.

thanks in advance.

We’re doing exactly this, but under high strain things still fall apart for us with connections breaking at the temporal side. I am quite confident it has something todo with the custom connection pool and reference counting on the connections that is implemented in temporal fpr postgres.

Recently we experienced an outage we were not able to fix besides the, let’s prevent more activities to be created and let’s drain the existing work queues.

is there any property called “keepAlive” we can use ?

Temporal static config allows you to set maxConnLifetime, which you set to 30m (default is 1hr) which sets the connection lifetime duration.
Setting this to 0 would mean on postgres side you would want to reuse the conn forever.

Would also look in your postgres config to see if you are setting idle_in_transaction_session_timeout and if so try to set it to 0 (to disable it) or a larger duration