Hi,
Awesome work. I cant thank the team enough for as elegant a solution. I have some key questions and use cases when trying to consider how to proceed relating to some design patterns. Obviously, many solutions to every problem are possible however some solutions will be anti-patterns. I am looking for advice/recommendation on a preferred practice relating to key design choices.
Use Case 1: Shopping Cart - On information and lifetime
How early do we create workflows
Q1.1. Should a workflow be created per shopping cart or should the workflow be created only once checkout has started?
How to pass information between architectural layers of a system
Q1.2. Should the cart contents be persisted by the web-layer to a database and the identity of the cart be published to the workflow? Or should the cart contents be submitted to the workflow which is then responsible to send the information to the different activities (through signals)? Changes to the cart via signals?
UI/Web layer integration with Temporal
Q1.3. Where a web application server is integrating with the backend layer, should the appserver trigger workflows directly? (In J.EE it would be in some controller, or it may be a rest service that starts a process, or some UI application).
Bi Directional asynchronous communication - General pattern
Q1.4. How would a workflow respond back a result to an originating webserver which might push a result back to a client (HTTP2/websocket/etc)? The final step in a workflow would call an activity which would be responsible to trigger a callback to a server which is responsible to trigger the HTTP2 push/Websocket push/Diameter push/etc. The actual callback destination would need to be passed into the workflow either by configuration or as a parameter. This is almost obvious.
Bi Directional asynchronous communication - Client ID
Q1.4.1. Just confirming that there is no stateful client identifier that can be used for bidirectional communication back to that specific initiating server.
Use Case 2: Billing System Bank Integration - On volume and bulk processing
Volume limits on workflows?
Q2.1. Should a workflow be create per user for which bank collections are taking place or should the workflow define and manage the bulk steps only? In the scenario there are 500,000 accounts for which bank collections must take place. That would mean there may be 500,000 workflows publishing information to some sort of collector which publishes a batch file to a bank and then receives a response for processing.
Is there a benefit to “groups” of datasets?
Q2.2. If there are a number of accounts to be processed (e.g. 543,212), where a workflow is initiated to trigger activities that pick up 543212 accounts and split those accounts into 500 groups of 100 which then has a workflow each which then manage 1000 accounts. Control is effectively top down if the workflows are managed as “child workflows”. Is this a viable pattern?
How to apply data-fixes when things go wrong? The above Q2.2 would be an anti-pattern?
Q2.2.2. In the scenario where information is published into the workflow, and a “data” fix is needed to be applied by an operations team due to unforseen issue, how should this be achieved? (I see this is effectively the downside of managing the information within the flow)
How to identify when a “batch” of workflows has completed and we need to do some aggregate workflows
Q2.3. If there are a number of accounts to be processed (e.g. 543,212), each has its account code on a message queue. An activity will process the queue and publish 1 workflow per message. I want to spawned a workflow once all of the dependent workflows have been successfully processed. How do I achieve this?
Is 2.3 an anti-pattern?
Q2.4. In Q2.3 we advocate 1 workflow for 1 account. This might include activities like communicating to customers triggering workflows to update bank details, and have different workflows per bank type, etc. Is the above an anti-pattern and we are using the wrong tool for the job or is this an acceptable use case?
Use Case 3 When good times go bad - On operational management of failing workflows
On dynamic prioritization
Q3.1. When 100,000 workflows submitted by customers are failing because of a dependent system failing, I want to ensure that, after recovery, all newly created workflows submitted by customers get processed before the workflows that are in a failed state. How to achieve this without having delays on retries? I want to fail fast but then resume eagerly with lower priority to newly created workflows.
On patching
Q3.2. When 100,000 workflows are submitted, and I want to find a subset of these to be influenced/patched. I understand that we need elastic search to be able to query on a fine grained subset or we have the state of the workflow publish to an aggregate table. What would be the best way to modify/patch the dataset? From my understanding this is not possible. We would have to release a new version of code and then have that code either accept a signal or introduce an additional activity to introduce new information.
On replaying events with changed workflows
Q3.3. When introducing new versions, my understanding is that workflows are “replayed” based on the stored state. Say I had the following.
private final MyProcessingActivities activities; public void processStuff(Arguments args) { String uid = args.getUid(); String a1 = activities.doitAlpha(uid); String a2 = activities.doitAlpha(uid); String a3 = activities.doitBeta(uid); String a4 = activities.doitAlpha(uid); }
becomes
private final MyProcessingActivities activities; public void processStuff(Arguments args) { String uid = args.getUid(); String a5 = activities.doitAlpha(uid); String a6 = activities.doitBeta(uid); String a7= activities.doitAlpha(uid); }
If the workflow failed on Beta. The reprocess does not actually “remove” the second Alpha step. This happened already. In the new version, how does the playback of the “removed” alpha a2 impact the result of alpha a7? Will a7 be the same as a2 ?
On replaying events with altered activities
Q3.4. When a process fails on the Beta step below, and the activity Alpha needs to be rerun as an underlying data-fix has happened. How to can we restart the workflow to process from the Alpha step below? My understanding is that when beta throws an exception it would resume on the beta step since the results of the Alpha step has already processed.
public void processStuff(Arguments args) { String uid = args.getUid(); String a8 = activities.doitAlpha(uid); String a9 = activities.doitBeta(uid); String a10= activities.doitCeti(uid); }
I hope my contrived use-cases are as succinct as possible to illicit the preferred design pattern/guidance for the problem. I hope I have not put too much in one topic however the collection of responses as a body of knowledge might be useful for someone else.
Thank you,
-Tim