Production HA setup

Hi, is there a recommendation guide on how to do a production setup for temporal with mysql for High Availability.

My initial thought was since all state is preserved in DB, i could possibly have multiple temporal servers pointing to same mysql instance and things should work.
But the documents do mention about broadcastAddress, membership etc, as well is this applicable only for cross dc replication?

1 Like

High Availability setup is overloaded term. Do you want the ability for temporal service operating during deployments or if some of its nodes fail? Do you want the ability to survive a database host outage? Do you want to be available in the case of a whole region outage?

Temporal supports all of the above. But depending on your requirements a different setup is needed.

1 Like

To start with i want temporal to survive node fails.
I was under the impression that for this, all nodes could just connect to same mysql and nothing else is required. (i mean broadcastaddress,membership settings etc).

Is that understanding correct?

Also can cross DC replication be done on mysql or does it work only with cassendra out of box?

1 Like

For node failures, the important requirement is that all nodes can talk to each other. We had cases when network policies precluded such connectivity and it caused issues. Then the ports should be specified. The membership is configured automatically through DB. You can look into the helm chart for the needed configuration options.

There are companies running cross DC setup on top of MySQL.

1 Like

@maxim Is there a way to reliably verify that all the nodes are able to talk to each other(for ex: status in the db or logs etc.,) - required to confirm that we have set-up network policies properly.

Make sure that the list of hosts from these two commands matches:

tctl admin membership list_gossip

tctl admin membership list_db

How do the nodes talk, which protocol, what ports? Is there any document I can refer to?

@madhu - Please refer to comment from samar on different roles and the ports that they expose Communication between multiple instances of temporal server

1 Like

thanks @ganta lemme take a look thanks for pointing to the other thread. :grinning:

When i run list_gossip it gives me the three hosts in the cluster
but when i run tctl admin membership list_db i get an error

It looks like it tries to connect to localhost instead of my CASSANDRA_SEEDS

etc/temporal # tctl  admin membership list_db
2020/11/23 15:37:23 gocql: unable to dial control conn EOF
Error: Failed to initialize cluster metadata manager
Error Details: create cassandra session from cluster: gocql: unable to create session: control: unable to connect to initial hosts: EOF
Stack trace:
goroutine 1 [running]:
runtime/debug.Stack(0xd, 0x0, 0x0)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9f
        /usr/local/go/src/runtime/debug/stack.go:16 +0x25, 0x2d, 0x2025500, 0xc00000c080)
        /temporal/tools/cli/util.go:541 +0x2be, 0x2d, 0x2025500, 0xc00000c080)
        /temporal/tools/cli/util.go:552 +0x49
        /temporal/tools/cli/adminCommands.go:627 +0x325
        /temporal/tools/cli/admin.go:291 +0x2b, 0x1e2db70, 0xc00012f8c0, 0xc00012f8c0, 0x0)
        /go/pkg/mod/ +0x58, 0x7, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1daf9d3, 0x1d, 0x0, ...)
        /go/pkg/mod/ +0x58e*App).RunAsSubcommand(0xc0004ee380, 0xc00012f600, 0x0, 0x0)
        /go/pkg/mod/ +0x954, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1db9f6b, 0x21, 0x0, ...)
        /go/pkg/mod/ +0x87f, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1db9f6b, 0x21, 0x0, ...)
        /go/pkg/mod/ +0x9f4*App).RunAsSubcommand(0xc0003bbdc0, 0xc00012f4a0, 0x0, 0x0)
        /go/pkg/mod/ +0x954, 0x5, 0x0, 0x0, 0xc0004a2b50, 0x1, 0x1, 0x1d9a32b, 0x13, 0x0, ...)
        /go/pkg/mod/ +0x87f, 0x5, 0x0, 0x0, 0xc0004a2b50, 0x1, 0x1, 0x1d9a32b, 0x13, 0x0, ...)
        /go/pkg/mod/ +0x9f4*App).Run(0xc0003bba40, 0xc0000ba040, 0x4, 0x4, 0x0, 0x0)
        /go/pkg/mod/ +0x7e8
        /temporal/cmd/tools/cli/main.go:37 +0x4e