Determinism: Daily task where data changes daily, how to handle resumption of a workflow?

Hey all,

I am looking at Temporal for the following use case and I’m wondering about how to ensure the workflow is deterministic.

My company is given large datasets, usually a BigQuery table, that we read, process, with an internal rules engine, and then multiplex out to many different external APIs.

Our current setup is a choreography of custom built background jobs and we’re definitely starting to see why we need something like Temporal.

In our current state, the client’s BigQuery table is updated nightly and we pull changes in the morning for processing. The amount of change per day varies quite a bit.

If I have a workflow such as:

  1. Start SyncWorkflow, pull client DB info from our data store.
  2. Run activity: Query the changed row count from the table, return the row_count
  3. Assume row_count=100,000, run one TableReadActivity per 5000 rows, 20 activities are queued.
  4. The 20 TableReadActivities execute, save the 5000 rows to a private blob storage, return the file_reference.
  5. Loop over the collected file_references, run Activity to process each one.
  6. Complete.

What happens if this workflow errors out at step 4, the client updates their data table (row_count will now be different!), and the workflow is restarted?

Is it possible to tell it to rollback to step 2 and restart from there?

It seems like because the row_count could change out from underneath the run, it’s not deterministic? Essentially the workflow would have knowledge of an obsolete state of the client DB.

Any help is appreciated, perhaps I’m approaching the problem the wrong way.

Cheers,
Georges

For a given Workflow Type , its Workflow Definition (implementation) must produce the same sequence of Commands given the same History .

Have you seen these posts that might help?

Spencer’s post was helpful from a theory point of view, thanks!

So, in my case, it would be possible for a resumed workflow to run a different number of Activities than the first run because there would be a different number of rows in the second run.

This would generate a different Event History and thus trigger a determinism error.

It sounds like my next task is re-architecting my system to ensure inputs remain stable over time.

Thanks Patrick!

Just to add,

What happens if this workflow errors out at step 4,

By default workflows don’t fail on unknown failures but block workflow execution waiting for a fix.

You could model this as retrying a sequence of activities, making sure that in each retry you run steps 1-4.

Which SDK are you using ?
Java SDK provides Workflow.retry , for other SDKs you would need to implement this in your workflow code.
See fileprocessing sample (Java , Go ) for sample impls.

1 Like

By default workflows don’t fail on unknown failures but block workflow execution waiting for a fix.

Yes, that’s some pre-Temporal language :slight_smile: I’ll have to get used to a new way of thinking.

I am using the unofficial Ruby SDK, found here. I will take a look at the support and see how I can use the retry API.

That said, I’ve spoken to my team and I think we have a way to maintain consistent inputs for at least 30 days (versus the current 24 hours) and that should be more than long enough for us to make sure each daily sync is successful with stable inputs.

Thanks for the pointer!

1 Like

I think there is a general misunderstanding of workflow determinism. Activity code doesn’t need to be deterministic. And as workflow code doesn’t affect the underlying datastore any changes to that datastore cannot lead to nondeterminism.

It doesn’t mean that your algorithm doesn’t have issues if a sequence of activities sees non consistent data in the underlying data store. But it would be the problem with your application which is not related to the Temporal determinism requirements.

@maxim Thanks for your input on this question.

I think I am unclear on how Activity return values and workflow code interact.

Pseudocode:

Workflow {
    var count = ActivityA.execute
    for count do {
        ActivityB.execute
    }
}

If at T1, ActivityA returns 5 and at T2 it returns 12, this is not deterministic. Is that correct?
Based on the links @Patrick_Rachford provided, I think the answer is that it is not deterministic.

Anyway, I agree that there is a larger underlying issue with the way the app is structured and we’re going to tackle that first. I think that will sort out these issues.

If at T1, ActivityA returns 5 and at T2 it returns 12, this is not deterministic. Is that correct?

Activity results are recorded in workflow history. On workflow replay the recorded result is used rather than executing your activity again. Would suggest watching this video for more info.

1 Like