When to use SQS?

I believe Temporal is a replacement for SQS, but is there any scenario in which you would prefer SQS over Temporal? Are there any major gotchas?

Also, is a single activity in a workflow an anti-pattern? I would imagine no and would only be a problem when it comes to wf/sec throughput.

5 Likes

I assume that your question applies to any queuing solution (Kafka, RabbitMQ, etc.), not only SQS.

What are the features of a distributed queue?

  • Producers can enqueue tasks
  • Accumulate (backlog) tasks if consumers are down or slow
  • Deliver each task to a single consumer
  • The consumer can report task completion (ack) or failure (nack) back to the queue
  • If a task is not acked/nacked within a configured timeout it is considered nacked.
  • Some queues support extending the running task timeout (aka heartbeat)
  • Nacked tasks are redeliverd, possibly after some backoff interval
  • Some queues support Dead Letter Queues (DLQs). Tasks that are nacked too many times are moved to a DLQ.

What are the common limitations?

  • No transactions between the queue and other data storages
  • Maximum task execution time is limited even when heartbeating is supported.
  • The duration of retries is limited. So it is not possible to retry task for a few hours, for example.
  • Task cancellation is not supported
  • Error handling in case of task failures is very primitive. DLQ is the only mechanism.
  • Getting the status of a specific task is not supported.
  • Tasks are executed with at least once semantic

So when task qeueues are a good fit as an application level primitive?

  • The task is stateless. So its creation doesn’t require an update to some other DB as there are no transactions between the DB and the queue.
  • Idempotent task
  • Short task
  • Short duration/number of retries
  • No error handling besides retrying later from DLQ is needed
  • Human intervention is OK to deal with the messages in DLQ
  • No need to get a task status
  • No need to cancel the task
  • The task is fully independent and doesn’t depend on other tasks or cause the execution of other tasks.

My experience tells me that a very narrow set of scenarios fit these limitations. In most cases, tasks are not independent, can execute for a long time, require long retries, require actual error handling, and benefit from transactionality between the database and the queue.

6 Likes