If I model the system millions of actors like Uber drivers, just wonder there is a use case where we need to update multiple drivers info exclusively, is there any concept like locking? If not, how such a problem get addressed?
We don’t yet have support for explicit atomic operations across multiple workflows.
The main primitives you can use to implement them yourself are signals and uniqueness of workflows by id.
If some resources are grouped together a workflow with the
WorkflowId equal to the GroupId can be used to manipulate this group in an exclusive manner. I’ve seen this pattern in infrastructure provisioning. For example, having a workflow per host and a workflow per rack allows limiting the number of simultaneous host upgrades per rack. The rack workflow doesn’t perform any operations itself. It acts as a semaphore. It receives signals with requests from host workflows and sends response signals that grant permission to perform a host upgrade. See the mutex sample that demonstrates this technique.
On another note, in case we would like to manipulate multiple drivers’ info as a transaction, I would assume the transaction will be modeled as a workflow itself with compensation logic, right? How is the isolation level modeled here?
If resource grouping is dynamic then you’ll have to implement some sort of two-phase commit using signals for cross workflow communication. In this case, the transaction workflow would send the prepare and commit signals to the entity workflows.
The 3rd question I have, is that how in Uber’s use case, we can generate visibility/report/analytics on the millions of drivers, because the information is in the drivers objects, it seemed less efficient to query all the drivers object. Is the common practice to dump. the object’s info periodically to a data lake and use big data technology to query the data set in the data lake?
Yes, it is common for workflows to have activities that push relevant information to downstream analytic systems.