We are currently developing a service that runs a REST API and on certain request needs to trigger long running external services. So for example if a user registers, we need to initialise certain other services that might take a couple of minutes. So we are looking into Temporal to execute those as workflows.
For the deployment we currently have 3 large servers in our own datacenter (dom1, dom2, dom3) that run multiple services in linux containers. Each of the doms runs a MariaDB instance (mdb1, mdb2, mdb3) that are synchronised as a Galera cluster. And each of the doms also runs an api node (api1, api2, api3) to serve the requests. To make sure that we can survive both dom failure and single container failure, a load balancer distributes requests among the healthy api nodes and each api node is connected to all 3 database nodes.
We are not expecting high throughput on the workflows and executing those is not very time critical, but it’s important for us that workflows are started reliably. So what we would like is a cluster of Temporal servers that use our Galera cluster as backend. So on each dom, we would like to deploy one frontend, matching, history and worker service for starters (and eventually scale up services later if needed). Temporal services should be connected with Temporal servers on the other doms and each Temporal service should be connected to all 3 database nodes. Each api node should then utilise all 3 Temporal frontends to start the workflows.
Does this deployment make sense (or would it even work like this)?
Can Temporal utilize a MariaDB Galera cluster as datastore (can we just deploy the servers with multiple persistent sql connections)?
And is there a way to utilize the SDK client to connect to multiple Temporal servers for starting workflows? Or would we need to handle those cases ourselves to try a different Temporal server on failure?