Here are the Q&A questions that were answered during our recent Webinar: Deep-Dive: AI Agent Code Walkthrough with Temporal! Thank you to over 500 people who attended! We’d love to continue the conversation with you in #topic-ai on our Temporal Community Slack!
(Much thanks to @Josh_Smith and @Steve_Androulakis for running this webinar, and to they and Dallas Young and Anthony Young for answering most of these!)
General Questions
Will we receive a recording of this session and slide deck afterward via email if we registered for the event?
Yes, you will!
Where can I find the code that was demoed?
The GitHub repo is here: GitHub - temporal-community/temporal-ai-agent: This demo shows a multi-turn conversation with an AI agent running inside a Temporal workflow.
You can find this and many other helpful Temporal example projects in our Code Exchange.
I have even more questions that weren’t answered here!
Join us in #topic-ai in our Temporal Community Slack
AI Design Patterns
Can we implement a Supervisor node (in terms of LangGraph terminology) which is a Decider node?
Yes, in Temporal a supervisor could be a parent workflow that owns the loop, decides routing and delegates to agents (child workflows). The Temporal demo agent’s goal picking logic does this (although it changes goals in the one workflow without running children): https://github.com/temporal-community/temporal-ai-agent/blob/edb7df5b3c2e3c1723433fb61b2cadb800d2feb8/workflows/agent_goal_workflow.py#L175C39-L175C61
Are there ways to stream back LLM responses live? Using something like Server-Sent Events (SSE)? / How would you stream out the reasoning of the agent?
There’s two ways of streaming out the reasoning of an agent:
1.a Start a long running activity that kicks this off.
1.b Send a signal to the Workflow of the streaming results.
1.c You can query the workflow against this and see the results.
- Explore Local First Solution like Zero Sync, ElectricSQL, etc. (Still need to test this out) Note that Local First Solution does add a lot of complexity. Like having a PostgressDB.
Also, we are looking at possibly supporting streaming natively in Temporal in the future. Possibly!
Can I have a multi goal agent and a couple of agents can add up to a big goal? Trying to solve a use case where user can either aim for a big goal or any intermediate goals.
Yes, you would have to engineer your prompts and event loop to cater for this ‘parent goal’ type functionality. But it’s certainly doable.
For an experience like this, we have found using step-by-step style guardrails creates a perception by the user that the LLMs are more like bots, since the user says something like “account balance for december” and the bot says “great, what month would you like to check your account balance for?”
Instead of asking the orchestrator/intent recognition flow to identify details, we’ve been running the next workflow invisibly first so that it checks to see if the necessary information already exists or not.
Have you seen similar patterns? What’s working best for you all?
Nice. We kept it simple and linear for today’s demo - but definitely yes - some of the goals and tools in the repo are much more flexible and less linear, and the LLM agent can skip tools if it decides they’re not needed or if it has the info it needs. This is a common pattern and it’s quite nice with intelligent LLMs.
Demo Questions
Why prefer signal instead of the update event for the communication between the temporal and outside system (prompt) in the architecture?
I wanted a React interface that would show current chat state no matter what. You can use Update if you want. In reality I’d use a 3rd party store for chat history (e.g. Mongo)
Am I interpreting things correctly and seeing that this implementation does not directly leverage “built-in” tool calling in APIs like OpenAI’s, but rather asks the agents to define or specify tool calls inside of this custom structured output? In other words - we still have the LLM “calling tools” (specifying functions, providing arguments) - but this is wrapped by these custom control flow data structures. Is that right?
Correct. For two reasons:
- This is a Temporal demo first and foremost and not a “use all the LLM features” AI best practices demo.
- Secondly, we avoided tool calling so we had a simpler code base that could work with multiple LLMs (with Temporal holding a single way to do tool calling). This was helpful recently on stage at a keynote where LLM provider #1 went down and we switched in real time to LLM provider #2. Durability!
Did you look into or experiment with streaming the response from the LLM to a broker (i.e.. Redis) inside the activity and then use SSE for the UI?
That’s one viable solution. At the moment, we wanted to keep it simple for this demo. So polling via Query was the route that we went with.
How is the workflow closed with the while True? Is it closed manually?
‘
For example when the user asks to be done->that’s sent as a signal to the workflow->the LLM decides the user wants to be done based on their words->the workflow returns and is done. See temporal-ai-agent/workflows/agent_goal_workflow.py at edb7df5b3c2e3c1723433fb61b2cadb800d2feb8 · temporal-community/temporal-ai-agent · GitHub for the code that does this
Can MCP servers be used alongside this agentic framework?
Not yet, but it will be built by June for something I’m doing with Stripe!
How would you handle conditional tool confirmation if your tool registry moves to MCP?
Good question. The only way I can think of to support confirmations on a dynamic set of MCP tools is to have a specific LLM step where you present the MCP tool metadata and ask “is this something worth confirming”? This is the only way I can think of for a truly dynamic (ever changing) set of MCP tools you don’t have control over.
For the multigoal, can the user switch a goal midway?
Yes, if the user opts to do that the LLM can decide to switch goals->ask the user for the new goal.
Is there any integration / usage of Nexus here that can be showcased as part of this demo ?
I wanted to keep Nexus out of the current demo to reduce the amount of Temporal things people needed to know about. There’s a possible Nexus enhancement I’m thinking about (with MCP capability).
So actions the LLM has access to are all activities?
The LLM doesn’t have direct access to any tools/activities. I will explain this in the next section though.
Temporal Questions
Are there any higher level design techniques like DDD or other that are generally used to design larger scale systems that leverage Temporal?
Domain Driven Design and other architecture patterns still apply, Temporal makes them easier.
For example, workflows can make implementing what a domain does more durable, more clear, simpler.
Is Temporal scalable vertically or horizontally ?
Database vertical (if self hosting) but everything else horizontal (workers, service pods)
Could the tools be separate activities instead of a single activity running them?
Yes, or even run a whole long-running workflow!
How is the UI aware of updates to the message state?
The UI queries the workflow to determine what to display.
In flows where a user can step away for extended durations - is there a best practice to not take up worker resources until a response comes back?
Great question! Temporal workflows take zero resources if there aren’t any active tasks to be done->so worker resources are very low in such cases. If the system detects no interaction, it can basically run at zero resources used until the user starts interacting again.
If I want a long running chat how would you handle the number of signals etc. that the framework have?
Good Question. Before processing any additional work, you can check the history size in terms of bytes and length.
Is it correct that the workflow will basically unload from memory while it is waiting for a worker or user prompt and rehydrate from persisted state when new events arrive?
Mostly correct. By default, workers do cache workflow event histories for efficiency (sticky cache), but strictly speaking, yes there is no need to burden workers while workflows run and wait for work.
Is it possible to have state data for tools? i.e. maintaining an object instance live alongside the chat / execution of tools.
Yes, I’m assuming you want to have long-lived tools or multiple calls to tools with state that’s carried over. One way to do this is signal state back to the caller workflow and maintain it in workflow state. That way your workflow is aware of the tool execution state. It’s not possible to store complex in-memory (non-data) state of activities (e.g. a LangChain execution) alongside the workflow execution as state is serialized over the network. You could also consider running tools in child workflows, or using external persistence.
Are dynamic activity names supported in the NodeJS SDK as well, and how is this accomplished?
This isn’t available in the Typescript SDK yet. Github Issue: https://github.com/temporalio/sdk-go/issues/543
You say it’s not great to store it in the event history, so in that context for each activity when I’m in Temporal UI it would simply show a pointer reference as compared to perhaps a JSON payload of messages?
The most efficient way to work with a conversation history would be to store it in something like MongoDB or DynamoDB and have the UI poll that directly. Temporal isn’t made for storing large amounts of data per workflow in its history, nor is it meant to be something that’s queried at a high rate per workflow (as it requires a worker active to return the query result). So in a production scenario, storing and accessing this kind of data in a 3rd party store is the best way. We are looking at supporting large payloads directly, too, but we don’t have a ship date for that.
What have you folks found works best for versioning and upgrades of the workflows? Since these flows will change much more frequently than classic workflows since so much tuning is required.
Lots of good solutions to talk about but three things briefly: activity changes don’t need to be versioned, the workflows are super dynamic and their code doesn’t have to change much, and continue as new can help workflow code keep up to date.
What’s the additional incurred cost for running Temporal as compared to not having Temporal?
We recommend using Temporal Cloud so you don’t have to manage the entire infrastructure—just your worker cluster.
How would I deploy and run this Temporal based agentic AI application on AWS Kubernetes?
Quick Launch - Deploying your Workers on Amazon EKS | Temporal Platform Documentation is a great blog post about using Temporal and EKS
The AI agent workers will run the same way
Temporal has documentation for EKS, is there any for deploying workers in ECS ?
No ECS specific guide that I’m aware of, but it will be very similar to Google Cloud Run Temporal Workers & Google Cloud Run | Temporal
Do we have some sort of comparative analysis of resource utilization for agents (on decent volume) that is running on Temporal and non-Temporal (kubernetes containers) infra?
Not yet, but I’d love a community contrib on this!
Are there limits for how big the chat history can be in the workflow?
Yes. Per the todo list in the repo readme, in reality you’d store the chat history in a 3rd party store like MongoDB. Right now the limit would be thousands of messages, but Temporal’s event history isn’t the most efficient way of storing tool results etc.
Is there a difference in the setup between the AI workflow that we are seeing and just setting up a regular Temporal workflow and making some calls to open AI from within an activity?
Not a huge difference. See the latter half of the webinar which talks through the code workflow.
What is the max size of the conversation history?
A Workflow Execution’s Event History is limited to 51,200 Events or 50 MB. See here: System limits - Temporal Cloud | Temporal Platform Documentation If we needed more storage we could compress or use external storage history.
A couple of notes on history size: There are API/CLI commands for getting the workflow history size. In the real world, you’d prob store the conversation history/context in an external DB. You could also use the claim check pattern to pass references to data in say, S3, rather than storing directly
Question: In an e-commerce system using Temporal, is it okay to have thousands of workflows running but idle (no activity currently executing)?
If the whole Temporal cluster goes down, will workflows resume from where they left off? And is the entire workflow state stored in Postgres?”
Great Question. If the Temporal Cluster goes down, then the workflow will stay in a running state until the cluster comes back online. To address this type of issue, you want to enabled Multi-Region Namespace (MRN).
This allows you to failover to a different region.
If the conversation history will always be under 50MB for some use case, do you still recommend using external DB for state management?
Whether you need to retain conversation history determines if it’s worth saving in the database.
Using the workflow to manage chat history is straightforward and convenient—though you’re limited by how much history you can send to the LLM.
Other Questions: Third Party Tools
How would you suggest we use this with agent frameworks like LangGraph? Would you suggest we use Temporal instead of an agent framework?
Take a look at this sample: samples-python/langchain at main · temporalio/samples-python · GitHub
Now it’s with LangChain, but similar concepts.
Are there any higher level primitives of orchestration like you would have in the agentic frameworks (e.g. LangGraph or CrewAI) e.g. conversation history management, guardrail hooks, task queue, etc.?
Temporal are examining higher level primitives to support agentic AI use cases, specifically. No roadmap or announcements to share about that yet, though.
We are using LangGraph to control the agentic flow. How can I integrate this with LangGraph? I want to set each agent as an activity. Our current code is putting the whole agentic flow in one activity which gives limited integration capability.
This is more a LangGraph question than a Temporal one, but our understanding is that LangGraph keeps mutable State in memory (with an in-memory ‘MemorySaver’). And allows checkpointing graph progress to databases). You would need to find a way of breaking LangGraph up into serializable payloads to be able to split a LangGraph agent up into Temporal activities. Until then, executing your LangGraph agents as one Temporal activity will work.
How does this Temporal’s approach compare to PydanticAi with Pydantic Graph? Also, given that Temporal has added support for Pydantic, how would you leverage this in this demo code?
Pydantic data typing support is on the (long) todo list for the Temporal ai agent. How I’d leverage it is introducing rich data structures for prompts, responses, and so forth. As for Pydantic Graph, I find it more natural to code in a ‘step by step’ fashion with an event loop (for/while) and events like signals vs a graph type approach where one writes code to define a set of nodes and how they’re interconnected.
Are there any plans at Temporal to make create a framework like PydanticAI, but based on Temporal? Something like this: GitHub - StreetLamb/rojak: Python library for building durable and scalable multi-agent orchestrations. ?
We’re exploring ways to streamline development to any LLM provider and solution, so customers don’t have to build everything from scratch.
Is Kapa running in a Temporal workflow?
While I can’t share non-public information about how companies use Temporal, I would note that many very popular AI research and retrieval solutions use Temporal for orchestration.
Is there OOB code available to parse Open API spec files (either JSON or YAML) and create tool definitions that LLMs can use?
No (not yet)
Have you all considered creating shareable context files / rules for tools like Cursor to help us use agents to implement these systems correctly, with AI assistants? Kind of like what context is doing, but not just a markdown dump of documentation?
Sure, but in a way isn’t Cursor rules more or less just a text dump anyway?
(serious answer is I think the dust needs to settle more on formats for agent context before adopting any in a heavy way)
– Addendum by attendee Nick Vrana –
Cursor rules can be a lot more, you can steer the behavior pretty substantially. Good example You are using Cursor AI incorrectly...
Totally agree the dust needs to settle though. Hoping it becomes good documentation practice in the future to create token efficient rules/guides for agents
Other Questions: Security
Is this the appropriate forum to discuss the concept of jailbreaking a language model or a workflow? I’m new to this field and unsure if this is a relevant topic to bring up here or if it’s a real issue.
Good Question. This webinar is for discussion how to design interaction between LLM and Temporal Workflow. If you want to discuss about “jailbreaking” a LLM model you should post this in our Temporal Community Slack.
LLM if has a conversation history, how do you suggest we manage PII data especially for financial system use cases?
Either scrub PII entirely or don’t send it.
Temporal encrypts payloads so you’re good from that standpoint but there’s currently no way to prompt a public LLM with PII without it being plain text, so you’d wanna look at local models instead etc
How do these tool calls validate the user for permissions checks? Would it be as simple as passing an auth token as part of that confirm signal?
Be careful when passing auth tokens as signal or activity input. These will be persisted, though you can encrypt payloads with our SDK’s Data Converter. Another way is to store an opaque reference to a secrets store, and have your workers able to access that store via IAM.