Temporal with MySql - how to survive master/slave failovers?

Hi there,

we’ve been testing temporal with Aurora DB and there were problems when we were doing failovers to new master. Temporal services were still connected to the former master and we had to restart all the services to make the cluster work again.

Is there a way to make failovers work or is this feature not supported yet?

Temporal absolutely does not support asynchronous replication as it breaks the strong consistency of the DB.

I’m not an Aurora expert. Is master/slave setup fully consistent?

I’m not sure, but I think it’s the same model as with MySQL. So the preferred way to use Temporal with MySQL is just with one master? Or what are the recommendations?

The recommendation is to use fully consistent synchronous replication if such a thing exists for MySQL.

ok so from the docs I can read that:

When data is written to the primary DB instance, Aurora synchronously replicates the data across Availability Zones to six storage nodes associated with your cluster volume. Doing so provides data redundancy, eliminates I/O freezes, and minimizes latency spikes during system backups.

Which seems to be what you are suggesting.

But the problem is that when failover occurs, the former master becomes a read only node. In this state, the temporal history service (and other ones) cannot write anything to the node but they are still connected to the node …

What you are suggesting is that the only way to be able to work with mysql is to have some kind of synchronous master-master setup? Am I understanding it correctly?

No, if such failover doesn’t corrupt data Temporal should be fixed to reconnect to the new master.

Would you file an issue to get this tracked?

sure, thanks for feedback

@rastusik if you can provide some details?