The Temporal documentation recommends that activities be idempotent, however as activity execution is described it seems to me that an activity which is atomic ought to be guaranteed safe.
As I understand it an activity that returns successfully will have it’s output saved and reused on any future workflow execution replay.
The hazard driving the recommendation of idempotency is that an activity may fail after partially completing, if the completed steps are destructive such as making a payment this can obviously have unintended consequences.
If it is possible to make an activity an atomic, such as by wrapping it in an atomically committed database transaction then their is no risk of partial completion.
However, idempotency is a stronger requirement than this. An atomic transaction run multiple times successfully may have a different effect to a single execution.
Therefore my question is are there any circumstances in which an activity will be re-run after completing successfully once?
If we have ‘exactly one success’ semantics on activities then we are free to rely on atomic but not necessarily idempotent activities, which would be useful.
It is not atomic due to environment, timeout, etc conditions. Say you start an activity but you cannot heartbeat, on heartbeat timeout another attempt is tried while the first one may still be running. I would have to confirm with other teams on behavior guarantees with retry policy of max-attempt as 1.
Exactly, which is why it is important to account for this.
This is not possible with retry support. We cannot know an activity has really stopped before we start another attempt.
Absolutely. What if your activity completes but network issue prevents relaying that completion to the server? If you have a heartbeat timeout or a start to close timeout, a retry will occur. Similarly, what if your activity’s return value has a problem being serialized? From your code’s perspective it completes, but it will fail from a system perspective.
I was thinking about this. Idempotency is definitely tricky.
My use case recently was an activity that writes something to a database. Talking with colleagues, we reached the conclusion that we’d have to:
Have a well defined primary key that the workflow execution is aware of
Try to insert it
If it fails, check to see if it’s an integrity constraint error (e.g. duplicated primary key). This means that we are retrying something that was completed before. If that’s the case, then we should not return an error so as to mark the activity as “complete”. Otherwise just error for retries.
But sometimes you just don’t have control over the database schema. Or maybe you have to rely on some other mechanism to tick the “have I successfully completed this action before?” checkbox.
This problem is not just for third party databases, it applies to third party APIs and many other external system interactions. If you don’t have control over third party idempotency guarantees, you often have to use stateful mechanisms you do control to add such assurances. Or in some cases you may have a tolerance for duplicated actions.
The successful execution of an activity and saving its output are not atomic. In very extreme cases, it’s possible that the activity was successful but Temporal failed to record the output.