Best Practices for Managing a stateful In‑Memory Cache in Temporal Python Workflows

Background

I’m building a computation workflow using the Temporal Python SDK.

  • Workflow: a long‑running “Entity Workflow” that receives a per‑minute signal (via an external signal), then triggers computation for that minute’s input.
  • Activity: Perform the computation from the new minute’s input plus a large, shared cache of historical context.

Because the cache is large (10+ GB), it’s impossible to serialize into Workflow state or pass as an activity argument.

Questions

  • Cache placement
    What’s the recommended way to hold a large in‑memory cache in a Temporal Python setup—should it live as a module‑level (global) variable, or be kept in a local variable inside the workflow’s run method?
  • Activity access
    How can activities best gain access to that cache—by passing it (or its reference) through activity arguments (my understanding is that this will not work because that 10+ GB cache is not serializable), by referencing a shared/global object, or via another pattern?

Thanks a lot!

Hi,

shared cache of historical context.

Is this cache shared between all workflow executions and all workers or is this cache local to the execution of a single workflow and you want to share the cache across the execution of a single workflow and its activities.

If the former, I would expect that you need some form of external system (if nothing else than a blob storage) to share the cache content between workers and workflows.

If the latter, you need to remember that it is not a given that a workflow with its activities all execute on the same worker. The workflow will try to execute sticky, but that is not guaranteed. If you want to ensure that the whole workflow executes on the same worker in order to for example share some global state, you will need to look at worker sessions.

Because the cache is large (10+ GB), it’s impossible to serialize into Workflow state or pass as an activity argument.

Right, Temporal would not allow you to do that. It exceeds the payload limits and would also be very inefficient. If one really wants to go down this route though, there would be an option to do that with a custom Payload Codec. temporal-large-payload-codec implements such an approach where the large data is transparently off-loaded to cloud storage. I definitely would explore other options first before embarking on codec approach.

Hope this helps,
–Hardy

Hi Hardy,

Thanks a lot for the prompt reply!

Is this cache shared between all workflow executions and all workers or is this cache local to the execution of a single workflow and you want to share the cache across the execution of a single workflow and its activities.

It is actually the latter. Also we want to have that cache for performance reasons, because this is a financial application.

If you want to ensure that the whole workflow executes on the same worker in order to for example share some global state, you will need to look at worker sessions.

Didn’t know work session before, will study it. In our deployment, we plan to have only one worker process to guarantee that everything is within the same process. The question now is actually - shall we just use a module global variable for the cache? Is this recommended for this use case.

Really appreciate your insights!