Load balance downstream services


I wanted to discuss a scenario and get some advice on the best possible approach.

We have an integration service which does the sole job of integrating with external 3rd party providers. Internal services can make requests to external providers only through the integration service and this service acts something like an API gateway for outbound requests, giving an unified API experience and provides all associated functionalities like authentication, metering, analytics etc.

Now, for some type of services (example: Payments, KYC etc), there could be more than one external service providers and the integration service needs to choose the best provider and additionally provide load balancing (say, round robin or failover) across those providers. Best provider is chosen at runtime, by calculating a score from internal metrics like cost, availability and latency metrics. Its a list of providers in a decreasing order of preference [SP1, SP2, SP3…]. This list is calculated at runtime based on live metrics and hence for requests which differ even by a small time quantum, the list may vary as well.

We are thinking of adopting Temporal for this specific use case and the following options come to my mind which I wanted to validate from the experts in this forum.

Option 1: One TaskQueue per service types

In this option, there is one task queue for a service type and it’s workers’ responsibility to choose the best service provider at runtime and also do the heavy lifting of load balancing, failure handling across service providers. However, in this case, we are not able to leverage on some of the features which Temporal provides say load balancing, rate limiting etc. An advantage of this approach is that we can keep finite number of workers polling a single task queue and hence is resource optimized.

Option 2: One TaskQueue per service providers

In this option, there is one task queue for each service providers, and there will be worker processes polling tasks from each of the queues. While we can now leverage the Temporal provided load balancing, rate limiting at a provider level, I am not sure where will I put the logic of choosing the best provider based on our internal metrics. Also, if I want to round robin among the service provider, what could the right way to do that?

I would like to have some advice on the problem statement or any other way of solving it.

Thanks in advance.

It looks like the Option 1 fits your use case better as it allows choosing a provider per task.

Thank you @maxim for your guidance.