Below is a somewhat contrived example “mega workflow.” This workflow contains multiple child workflows, some of which (A and B, D and E) can occur in parallel. Some workflows (C and F) depend on upstream workflows being completed. Data is generated during child workflows and needs to be passed downstream. As another wrinkle, child workflow F should automatically be started after D finishes. After completing workflow E, a user may choose whether or not to run F a second/multiple times more.
Each child workflow is its own well-defined unit which may depend on data having been generated upstream.
I’m trying to understand the following:
when should a workflow contain multiple child workflows vs. all of these workflows being independent and something else determining when to kick things off?
activities may call an endpoint A which generates and returns data. Some of this data may be required in downstream tasks/child workflows/other independent workflows. When should the workflow take data that is returned from an activity (is this possible?) and shuffle it around to other tasks/workflows and when should downstream tasks/workflows have activities for getting that data another way (i.e. an activity which makes a separate API call to get data which was generated from endpoint A)?
how do you draw the line between business logic in the workflow code vs. in external systems? For example, let’s say multiple conditions must be met before starting child workflow A. When should that logic be in the workflow vs. an external system that knows when the conditions are met and signals A to start?
Please let me know if these questions do not make any sense, and thank you for the help!
Please let me know if these questions do not make any sense, and thank you for the help!
These are pretty good questions in my opinion.
when should a workflow contain multiple child workflows vs. all of these workflows being independent and something else determining when to kick things off?
The child workflows are easier to work with as they can be invoked synchronously. Their lifecycle is linked to the parent. It is helpful if you want to terminate them automatically when parent is terminated. But it may be a hassle as they cannot outlive the parent process.
In your case child workflows look like a good fit.
activities may call an endpoint A which generates and returns data. Some of this data may be required in downstream tasks/child workflows/other independent workflows. When should the workflow take data that is returned from an activity (is this possible?) and shuffle it around to other tasks/workflows and when should downstream tasks/workflows have activities for getting that data another way (i.e. an activity which makes a separate API call to get data which was generated from endpoint A )?
It depends on data size.
If the data is under a few hundred kilobytes then passing it directly as activity inputs and outputs is preferred. It is simpler and gives good visibility as inputs and outputs of activity invocations are visible in the UI/CLI.
For large data either store it in some external blob store or DB and pass references to it through workflow or cache it in some process/host and route activities to the same host.
how do you draw the line between business logic in the workflow code vs. in external systems? For example, let’s say multiple conditions must be met before starting child workflow A. When should that logic be in the workflow vs. an external system that knows when the conditions are met and signals A to start?
I don’t think there is a clear rule in this case. If conditions are simple then storing them in the workflow is OK. If they are complex and tend to change frequently moving them into an activity might simplify application maintenance as workflow code has to be versioned when changed. An activity implementation can be updated at any time without the need to think about backward compatibility (if the interface is not changed). I’ve seen a few systems when an activity encapsulated the whole rule engine that would dynamically define which actions workflow has to execute.