Temporal studying - various questions

Hello Temporal Team,
With my team, I am studying for my company the possibility to use Temporal to orchestrate our microservices and I have some points to clarify.
We are also investigating other engines (Zeebe), to be sure to make the right decision

First our requirements are:

  • an average of 100k workflows per hour
  • support burst of 1k workflows created within 1second
  • our workflow will have ~10 activities
  • support long running workflow (asynchronous activity may require manual interraction), I don’t know how it will impact whole performance
    We would use postgresql as database

Here my questions:

  • Is there a commercial support available (for security issue fix, cluster setup help, temporal production issues investigation)? If yes where can I find information?
  • NUM_HISTORY_SHARDS can’t be changed after cluster creation, is it planned on the future to be able to change it? I understand it is not a trivial change, but I’m a bit afraid to be stuck with a value which does not match our future requirement.
  • Based on my requirements, how should I configure my cluster? I there any guide based on an estimated workload to define NUM_HISTORY_SHARDS, numTaskqueueReadPartitions, numTaskqueueWritePartitions, values? And also the number of history, matching, worker services we have to deploy ?
  • java-sdk documentation is poor compared to Go-sdk (for example Signals, ContinueAsNew, etc… are missing from java-sdk doc), is it plan to improve it in near future? documentation is part of our criteria

Thanks to take the time to answer all of these questions.

Welcome!

  • Is there a commercial support available (for security issue fix, cluster setup help, temporal production issues investigation)? If yes where can I find information?

We don’t provide commercial support for on-premise deployments. Our cloud offering does come with full support and SLA.

  • NUM_HISTORY_SHARDS can’t be changed after cluster creation, i s it planned on the future to be able to change it ? I understand it is not a trivial change, but I’m a bit afraid to be stuck with a value which does not match our future requirement.

It is not really a problem as you can choose a pretty large number like 10+k shards. In your case, I would be more concerned with the choice of persistence. Scaling out PostgreSQL deployments is not possible at this point. For high scalability, Cassandra is recommended.

  • Based on my requirements, how should I configure my cluster? I there any guide based on an estimated workload to define NUM_HISTORY_SHARDS, numTaskqueueReadPartitions, numTaskqueueWritePartitions, values? And also the number of history, matching, worker services we have to deploy ?

We don’t have a guide for this yet. This forum answer has some relevant data.

  • java-sdk documentation is poor compared to Go-sdk (for example Signals, ContinueAsNew, etc… are missing from java-sdk doc), is it plan to improve it in near future? documentation is part of our criteria

We don’t share our roadmap. We do plan completely rewrite our SDK documentation, but I cannot promise that it is going to be in the near future.

Hi Maxim, thanks for your answers, very useful,
could you develop a bit why it would be difficult to scale temporal using postgresql? Is it related to read/write operations? or limitation on temporal implementation? indeed when I was testing with postgresql I have encountered issues (“too many clients already” errors)

I have worked with cassandra in the past, I know how cassandra works and what load we can support with it, but operating cassandra cluster require operations and we already operate postgresql clusters that’s why we would prefer to use postgresql.

why it would be difficult to scale temporal using postgresql? Is it related to read/write operations? or limitation on temporal implementation? indeed when I was testing with postgresql I have encountered issues (“too many clients errors”)

You can scale out Temporal cluster to saturate your PostreSQL DB instance. Then you can scale up the DB. But at some point, you are going to hit the hardware limit for a single host. With Cassandra, you can scale out the DB by adding more hosts. We ran Temporal with Cassandra clusters up to 100 hosts.

Also, I’m not sure if it is possible to configure PostreSQL for synchronous replication that doesn’t corrupt data in the presence of failovers.

1 Like

@maxim We are currently using cockroachdb on our pilot project with temporal, which is postgres wire compatible. I know that cockroachdb is not officially supported by temporal, nevertheless please correct me if I’m wrong: scaling out and the replication mechanism behind cockroachdb does solve the above mentioned issues with Postgres, right?

I’m not a CockroachDB expert, but I believe it does solve the scalability and replication problems of PostgreSQL.