Hi all,
I’m evaluating Temporal as a platform for orchestrating user/LLM interactions. In these interactions, an LLM “streams” text output to the user live as it is being generated, and the user can take action (e.g. press a stop button or interject) at any time.
I see two main options to frame this logic in Temporal constructs.
Option 1: short activities, each returning one “batch” of LLM output
This is the option used by the AI agent example.
My core activity would trigger one batch of LLM output (at the order of a sentence or so). This activity would be looped many times within my workflow, and a user would interject by sending a signal to the workflow.
But I have the following concerns:
-
In many LLM libraries, “resuming” a conversation is a heavy-weight operation that might, for instance, resend a full conversation history. Since my small activity runs many times (potentially across different workers), repeated resumes would incur huge latency / cost overheads.
IIUC, the typical mitigation would be to maintain ongoing connections to the LLM in worker memory, and use task routing to ensure the worker with the right ongoing connection executes relevant activities.
But how idiomatic is this in practice? Live logic (data coming through the ongoing connection) would constantly be running inbetween activities, “unmanaged” by Temporal. I’d have to keep that logic in sync with activity retries. I’d need to ensure workers are registered for the right custom queues in the event of a crash etc. -
My activities aren’t idempotent / pure: in truth, they accept “nothing” and produce “the next batch of text”. I suspect I can work around this with arbitrary UUIDs etc, but it also feels iffy.
-
I’d have to jump through hoops to stop my entire user input / LLM output from saturating the event log.
Option 2: long activities, each running an entire conversation
With this option, an activity runs an entire conversation, interjections included. Since communication between a live activity and its workflow is largely unsupported (IIUC), interjections would be ferried to the activity by some external message queue.
But I have the following concerns:
-
Since my activities run for ages, Temporal’s retry magic becomes much less helpful. I’d have to manually handle state storage and resumption so that big chunks of work aren’t lost on a crash.
-
I’d also have to implement this external message system and eschew signals/updates. My messages won’t be first-class citizens (e.g. on the event log) and I’d have a parallel control flow (e.g. a message might trigger a child workflow) not managed by Temporal.
My question
What is the right way to implement this kind of logic in Temporal? My initial impression is that Temporal’s abstractions just aren’t a comfortable fit for it.
Thank you for your time!