Best Practice for Managing Global User State Across Workflows (e.g., for Frequency Capping)

Hi all,

We’re designing a marketing automation system using Temporal. In our model, a user can be active in multiple independent “Campaign Workflows” simultaneously (e.g., a Black Friday campaign and a new user onboarding campaign).

The Challenge: We need to enforce global, user-level rules like frequency capping (e.g., “max 3 emails per day per user”) across all of these separate workflow executions for a given user.

A CampaignWorkflow can’t just manage a local list of sent messages, because it would be blind to messages sent by other concurrent workflows for the same user.

We have considered two main approaches:

  1. External Datastore: Use an Activity to read/write the user’s dispatch history from/to an external DB like Redis or BadgerDB. This works, but adds an external stateful dependency.

  2. “User Actor” Workflow: Have a single, long-running workflow per user (e.g., WorkflowID: "user-{userID}") that acts as the “single source of truth” for that user’s dispatch history. Campaign workflows would Query this actor for the history before sending and Signal it after a successful dispatch.

Our Question: What is the recommended, most scalable, and idiomatic Temporal pattern for this use case? Our system needs to scale to millions of concurrent users. Is the “User Actor” pattern the right choice, or is there a better feature (like Search Attributes, etc.) for managing this kind of shared, mutable state?

If the user actor workflow wouldn’t need to handle more than 10 or 20 -ish events per second for a single workflow (that is, for a single user when you have one workflow per user), that’s a good choice. (There isn’t an exact limit, but I’ve seen posts where people ask about higher rates and the Temporal folks say that individual workflows in Temporal aren’t designed for very high rates of handling events.)

If you have a hard limit for your frequency capping (e.g. absolutely never more than 3 emails per day), you have a potential race condition if you query the workflow: two campaign workflows could query the user actor workflow at the same time, both find out that only 2 emails have been sent today, and then both could proceed to send an email. If that’s a concern, the campaign workflow could send the user agent workflow a signal “please let me know if I can send an email” and the user agent workflow could reply with a signal of yes or no.

Whether you query or signal, as long as each individual user actor workflow is able to handle the load you get all the benefits of durability, etc. that Temporal provides. Temporal is designed to scale to millions of concurrent workflows.

Tnx, Yeah, as you mention FC is configurable per tenant but naturally should be about 5/10 per day… I don’t think we have a challenge with race condition if we use an update handler… because in the doc it says it is sync and signals are async in the message passing part of Temporal official doc…

Now, actually our requirements are growing… and I think using a workflow for each one doesn’t make sense. we’re thinking about ScyllaDB, but if it could be used as an underlying DB for Temporal it would be nice… I don’t know, the Temporal team and community have any plan to support ScyllaDB or not?

Hi,

another approach is the use the user workflow to send the messages. From the CampaignWorkflow you can signal the user workflow that contains the counter of how many emails have been send to the specific user.

I don’t know, the Temporal team and community have any plan to support ScyllaDB or not?

There are no plans AFAIK

Tnx, Yeah — that helps a lot.
I’ve been testing a few approaches and now I just have two follow-ups:


:one: Large-Scale User State (FC & Journey Re-entry) — Actor vs External DB

Following up on the “User Actor” pattern for FC — we now have a similar case for tracking Journey re-entry rules (basically storing last start/end per journey).

At our target scale (billions of users), having billions of long-running UserStateWorkflows — even with continue-as-new — feels heavy, almost like using Temporal as a DB instead of an orchestrator. Is that kind of a misuse of Temporal?

Our alternative is to use Cassandra (already backing Temporal) directly via Activities —
each record (journey_uid, user_uid) stores last start/end times, and Cassandra TTL can handle the “AFTER” rule cleanup automatically.

From your experience:

  • Would you still recommend the Actor pattern at this scale?

  • Or is pushing this kind of persistent state to Cassandra via Activities more idiomatic?

  • And if we go with Cassandra, is it fine to use the same cluster as Temporal (just another keyspace for these tables), or should we isolate it completely?


:two: Handling scheduledStartAt / scheduledEndAt for Campaigns & Journeys

Another thing — what’s the best way to handle scheduledStartAt and scheduledEndAt for campaigns/journeys (entity level, not per user)?

Right now we use StartWorkflowOptions.StartDelay for start (per user = per workflow), and WorkflowExecutionTimeout for end — but I’m not sure if that’s ideal.

My take is:

  • For scheduledStartAt, our Scheduler triggers Kafka messages at the given time. I think we could model this with Temporal Schedules, where the ScheduleWorkflowAction just runs a workflow that calls the Kafka producer activity. Does that sound cleaner than just using a cron job at scheduledStartAt?

  • For scheduledEndAt, maybe the workflow itself should have a small goroutine (workflow.Go) that checks periodically and terminates when time’s up. Or do you think WorkflowExecutionTimeout is still the better, more reliable option for that

  1. Temporal can absolutely scale to billions of open workflows. The choice between DB and Temporal shouldn’t be really about scale but about your requirements. If you are just CRUD some data, then Temporal doesn’t bring much value over DB beyond higher availability. If this data has a complex lifecycle, especially involving external events, timers, and calls to external APIs (activities), then Temporal delivers significant value.

For example, using a Temporal workflow to store a shopping cart sounds like overkill until you have to implement features like nagging a user about cart abandonment or notifying them about price changes of the items in the cart.

  1. workflow.NewTimer, workflow.Sleep and workflow.AwaitWithTimeout are the best options to wait for some time. Never use WorkflowExecutionTimeout for business level logic. It terminates the workflow without giving its code to perform any cleanup actions.

Also, Cassandra is a very sharp tool. Using it in a consistent manner is extremely hard. I would avoid using it for any use case that requires consistency.