Suggested documentation for signal delivery guarantees

I came across @Ioren’s comment from 2023 Can a workflow receive signals more than once? - #2 by loren that it’s possible in rare cases for a signal to be delivered more than once. I was surprised because I didn’t recall seeing that in the documentation.

It was in the documentation back in 2023: Workflows | Temporal Documentation, but I’m not seeing it now.

It’s not unreasonable (what if the client makes a network call to the Temporal server which records the signal, but a network glitch prevents a success response from getting back to the client), but wasn’t a scenario that had occurred to me.

In general, it would be a good idea to document delivery guarantees. Duplicate signal delivery (or out of order signal delivery with multi-region) might be very rare… yet sometimes it’s worth it to code against even very rare issues (“don’t double spend $10,000”)… but we don’t know what should be included in the design without knowing what issues would need to be considered.

1 Like

Hi @awwx

thanks for the feedback. It is mentioned here Temporal Workflow message passing - Signals, Queries, & Updates | Temporal Documentation do you think the documentation should cover more?

Antonio

Thank you @antonio.perez . I hadn’t noticed the encyclopedia section before.

I hadn’t known that by default the Temporal client SDK provided a randomized signal deduplication key, which is useful and important to know.

My personal wishlist for the documentation would be to have a separate section that described Temporal’s delivery guarantees. (The documentation could then continue with sections such as the existing “Ensuring your messages are processed exactly once”, to explain how to code against the delivery guarantees.)

For example, some of the things I’d be interested in:

  • What is the signal deduplication interval? (For example, for Amazon SQS it’s five minutes)
  • What is the mechanism that deduplication doesn’t work across continue-with-new boundaries? (A workflow stub created on the client with only a workflow id and not a run id; on the server the received signal is resolved to the currently executing run id; deduplication is by workflow id / run id ?)
  • The encyclopedia mentions that clients can provide their own idempotency key. How is this done? I didn’t see a mention of it in the Java or Go SDKs. Oh, the next paragraph says that the workflow would check the passed key itself. I was confused by the sentence “Temporal’s SDKs provide a randomized key by default”, which sounded like the client provided idempotency key would replace the randomized key generated by the SDK.
  • When are signals guaranteed to be delivered in order (if ever)? I seem to recall reading once that when using standard (non-multi region) Temporal that two signal calls, from the same client to the same running workflow and to the same signal name, when the second call is made after the first successfully completes, will be delivered in order. I don’t know if that’s true though.
  • If the previous is true, what about sending signals to the same signal name from different clients: the first client sends a signal and that call successfully completes; and then the second client sends a signal to the same executing workflow with the same signal name?
  • What failures might the client encounter calling signal in the SDK (e.g. network down)?
  • How can these failures be distinguished in the client (i.e., which exceptions are thrown)?
  • Which failures are worth retrying?
  • Does the SDK automatically retry? If so, for how long?

@awwx this is great, thanks again.

Let me work on this and get back to you. I will share this with the doc team too.

Thanks
Antonio

1 Like