Deeply nested dynamic concurrent workflows with dependency management

Hi Team, I have a very complex business use-case for which i am trying to see how can i leverage the power of temporal and build a resilient application.

Pleaser refer to this flow : Mermaid Chart - Create complex, visual diagrams with text. A smarter way of creating diagrams.

  1. Our biz idea supports multi-tenancy - so there are 1…N number of tenants using our app simultaneously - each of them interacts with our webapp and will provide some raw-data information.

  2. workflow-tenant_01 - we wanted to keep one long running workflow (which will be the parent workflow). this will run throughout the lifecycle of the tenant. the job of this workflow is to simple take in the raw-data from the user and pass it on to the backend microservice - config-engine

  3. workflow-config-tenant_01 is the workflow that takes in the raw-data and generates meaningful config (which is a HUGE json payload). so basically workflow-config-tenant_01 becomes the child-workflow which is triggered by the parent workflow workflow-tenant_01.

  4. here on , each workflow-config-tenant_01 have 1…N Services - denoted by workflow-config-service-tenant_01 - so now this becomes the child workflow and workflow-config-tenant_01 becomes the parent workflow - and depending upon the config - these workflows will either run concurrenlty or sequentially

  5. each service workflow workflow-config-service-tenant_01 will run a set of CRUD ops (again they may run concurrently or sequentially based on the config) - and each of them will be an activity

  6. Each tenant is isolated under temporals namespace

NOTE: the parent workflow is in python and the remaining downstream child workflow are in golang.

HELP NEEDED on the below points.

  1. Since workflow-tenant_01 is the only workflow that initiates the entire cycle - whats the best way to trigger an event that will start the child workflows ? should i use an http request or a signal? i use protobufs - so is rpc preferred over http ?

  2. if more tenants gets added - how do we dynamically add new parent workflows ?

  3. how do we run concurrent workflow-executions ? i believe workflow.Go() is for running concurrent activities and not workflows , right ?

  4. there are workflows which are independant and can run concurrently and then there are workflows that are dependant on each-other and must be run sequentially - whats the best way to manage workflow execution considering both these scenarios ?

  5. each service is a standalone mircoservice (k8s deployment) - so this has a worker running , and a workflow which is waiting to be triggered by the immediate parent workflow. since this single service deployment is used to handle the req’s for all the tenants - whats the best way to deal with such a situation where in a single k8s-deployment of a service (say SERVICE_A) - that has a single worker (which can handle millions of workflows as per docs) must run each tenants child workflow trigger concurrently

  6. This would mean that we would handling millions of workflows even for a very few tenants (say 10 in number), so what are the limitations/set-backs/etc for all the above points ??

ps: Please note that all this NOT YET implemented - we’re currently in idea evaluation phase - and are working on our POC

For 1, 2 would look into using SignalWithStart from a client, docs for go here
you can also expose endpoint if its easier and have it call temporal sdk api to do this if it makes it easier

For 3 you can start start multiple your workflow executions from client or using sdk client apis in activity code, they would run concurrently. Do you mean child workflows maybe?
ExecuteChildWorkflow returns ChildWorkflowFuture which you can wait on before parent completes. Just note there are limits to number of child workflows a single parent workflow can start, dynamic config limit.numPendingChildExecutions.error, default 2K so you would need to think about partitioning event history on this fanout on child workflows by using continueasnew (workflow.NewContinueAsNewError) or consider a sliding window approach, sample here, or can consider using a long-running activity that starts them as regular workflow executions.

For 4) I think you can control concurrency from within your workflow code, can you give a bit more info on this use case maybe we can provide some code sample in sdk you need.

For 5) You can route workflow/child workflow execution to this worker by specifying the task queue its polling on when start new execution/child workflow. Is this worker running both workflows and activities for each tennant? Yes a single worker can handle a large number of workflow executions but imo you would need to load test this given the resources you give this worker pod to make sure it can handle the workload you are throwing at it. We do provide worker tuning sessions so would reach out to your Temporal point of contact and maybe request a meet for this.

For 6) I think we would need a bit more info on use case to understand the fanout use case a bit better and the overall size of the workload. We have users doing very large fanout use cases so this is completely possible with Temporal but recommendations on best approach are really dependent on use case specifics. Some of the things you will need to think about is how you implement the fanout given limitations of single workflow execution, as well as rate limits of downstream services that your workflows invoke through activities and finally your worker resources that need to handle the fanout.