Share your Temporal Spooky Stories! šŸŽƒ

October marks the launch of Spooky Stories, a series of terrifying tales from life before Temporal! :scream: (You can read more about the program in the linked announcement!)

In your pre-Temporal life, have you facedā€¦

  • Lost data or transactions (or worse, large large sums of money) due to outages or downtime?
  • Lost orders, double-charging orders, or other online/mobile order horror stories?
  • Brittle and complex legacy apps that are Jenga Towers of horror ā€“ too risky to change?
  • Failed batch jobs that needed a safe way to run and finish?
  • Another problem sure to inspire fear and terror in others?

If so, please contribute it here, and/or join the party in #contributing on Temporal Community Slack!

Our first Temporal Spooky Storyā„¢ comes from Gabriel Harris-Rouquette, Senior Software Engineer at Merit, (aka, ā€œLead Awesome-ifier of Kafkaā€ :wink:) about the terror of trying to manage long-running, unpredictable workloads with Kafka: :scream:


There once was a workflow implemented with Kafka, but a specific consumer of a message had an unbounded amount of work. Instead of pre-calculating and batching the work, it was decided to use consumer pauses, per the javadoc:

For use cases where message processing time varies unpredictably, neither of these options may be sufficient. The recommended way to handle these cases is to move message processing to another thread, which allows the consumer to continue calling poll while the processor is still working. Some care must be taken to ensure that committed offsets do not get ahead of the actual position. Typically, you must disable automatic commits and manually commit processed offsets for records only after the thread has finished handling them (depending on the delivery semantics you need). Note also that you will need to pause the partition so that no new records are received from poll until after thread has finished handling those previously returned.

Anyways, this lead to frightening results over time. Pain points overall:

  • Large job that had a variable amount of time to process a single Kafka message for an asynchronous workflow had limited visibility into progress, recoverability, and performed somewhat dangerous operations like pausing a consumer to avoid consumer group rebalancing.
  • Visibility into the job was minimal, we could only see if a consumer had started working on the message as the side effects
  • Failure to process the message meant restarting the whole job from the beginning, which if it fails half way processing a million objects, itā€™d restart at 0
  • Difficulty in setting up an overlap prevention or ā€œduplicate processingā€ since the messages were just queued, we couldnā€™t say ā€œhey, weā€™re already doing this workā€, so potential duplicated long running jobs would just get queued, even though half way through an existing running job would already process the new side effects.

Since weā€™ve migrated to Temporal, weā€™ve been able to batch process in a durable fashion, without having to implement the plumbing of writing our own state machines.


Thanks so much, @gabizou ! I for one am now truly terrified about all the things that can go wrong when attempting to manage batches without Temporal. :sweat_smile:


If you or someone you know is interested in learning more about using Temporal for batch processing workflows, here are some resources you can check out!

1 Like

Another Spooky Story for you all! :smiley:

Please join us in a week, Wednesday, October 9 @ 11:00am Pacific / 2pm Eastern (what is that in my timezone?) forā€¦ Chilling Temporal Anti-Patterns.

During this session, our Staff Solutions Architect @Josh_Smith will regale us with tales of common mistakes that we see from folks with prior distributed systems experience as theyā€™re making their initial foray into the world of Temporal. Join this session to learn more about Joshā€™s ā€œTop 10ā€ anti-patternsā€¦and how to choose a less-perilous path for yourself!

Sign up here: Webinar | Chilling Temporal Anti-Patterns

You know whatā€™s truly spooky? Web crawling! :spider: :spider_web:
You know whatā€™s even spookier? Web crawling, at massive scale. :scream:

Hereā€™s a great article from Java Developer Advocate Steve Poole about how he introduced Temporal to his scraper that crawls over 14M+ open source components at Maven Central to check for inconsistencies in Java API versions.

SPOILER ALERT: The Temporal-ized code ends up more reliable, recoverable, and less complex. :sunglasses: