Execution Guarantees of Activities

I’m wondering about the delivery / execution guarantees and semantics of temporal task queues.

Temporal talks about using “task queues” a lot in the documentation. Normally when we talk about queues, we also talk about delivery guarantees and semantics. Systems can have “at most once”, “at least once” and “exactly once” semantics. My question is which one of these do temporal queues fall under? This has a big effect on how we code activities.

If the activity is executed “at least once” then all activities must be coded in an idempotent way, where they expect they may be called more than once.

If the activity is executed “at most once” then the activity may not run and the system is not very useful in general.

If the activity is executed exactly once, then it would be possible to relax the idempotency of the activity logic, if we can safely rely on temporal to guarantee this.

This same question of course applies to workflow execution and child-workflow execution.

Thanks in advance for any insight!

1 Like

Temporal activity in a single cluster setup is executed at least once by default as it has an associated default retry policy.

If the retry options maximumAttempts is equal to 1 then the activity is executed at most once.

If the activity is executed “at most once” then the activity may not run and the system is not very useful in general.

This is not really true as in the case of Temporal as a workflow is guaranteed to get an error if an activity failed to execute. So workflow can take some compensating actions if necessary.

This same question of course applies to workflow execution and child-workflow execution.

A workflow is guaranteed to be executed exactly once unless it failes and is retried. Note that a workflow doesn’t have a default retry policy.

A child workflow is guaranteed to execute exactly once unless it fails and is retried as it is just a workflow.

Deleted my first reply because I didn’t understand what you said at first.

This is helpful though, thank you!

It would be great to have a section on this in the docs, sort of like rabbitmq’s reliability guide.

Hello,

Trying to figure out the “at least once” phrase.

Is there a case in which non-local activity, for whatever reason would be executed by the worker two times (exactly the same activity with the same payload)?

Timeouts?

Depends on your activity options. Activities, as Maxim mentioned have a default retry policy. By default in case your activity throws a retryable exception or times out per the set StartToCloseTimeout, it will be retried. A retry is a re-execution of the activity using the same inputs/payload.
You can control the amount retries via activity options ScheduleToCloseTimeout which is total time workflow is willing to wait for your activity to complete, including retries. If ScheduleToCloseTimeout is not specified, and you don’t explicitly set a retry policy in activity options, the retries will be performed until the Workflow Run/Execution timeout.

Note that you can also set a retry policy on your activity options, and as mentioned set maximumAttempts to 1 in which case your activity will not be retried and you get the “at most once” execution guarantee.

Thanks for the answer but it’s not what I was getting at.

I’m fully aware of the retry capabilities of temporal.

Lets put it this way. IF an activity is completed, IS there a way it can be completed twice accidentally, like with SQS messages?

Thanks for your help

Not to my awareness. If activity completes it’s completion results are recorded in the workflow history. On workflow history replay your worker is going to use the recorded result and will not re-execute the activity.

1 Like

Thank you very much!

Well, I am no Temporal expert but I could imagine that with a very low probability the business code of activity was executed but there was a problem communicating the result back to Temporal server (e.g. network problem). So for the Temporal server the activity would look timed out but for the world (and in a business sense) the activity was executed. Then the activity would be retried by Temporal server effectively executing the business code of the activity twice. So I would say Temporal should be “at least once” if not configured to be “at most once”. And I would say for that reason both activities that use messaging endpoints and activities that use API endpoints should be implemented using idempotency. Or am I wrong?

Hi @MikeM
for activities as mentioned execution guarantee is at least once, and at most once if retries are disabled via configuration on retry options for this activity.

For the idempotency, yes, in your scenario your activity in most cases should be idempotent,
If not possible, I think you could use heartbeating. You can include business level info in the heartbeat and if activity fails (due to timeout as you mentioned) and is retried, your activity can get the last heartbeat info and can decide to continue activity work from a specific position recorded in the last heartbeat, decide not to re-invoke a 3rd party service, etc.

Here is a thread with a similar question that I think has really good info as well.