Q&A from Deep Agents / Scale AI webinar

We had so many great questions during our Deep Agents webinar yesterday with @Steve_Androulakis and Daniel Miller at Scale AI! We ran out of time to answer all the questions, so here are the answers to help clarify the material covered.

Where is LLM in Deep Agent diagram?

Every box in the diagram will use an LLM to acheive its goals (e.g. ‘execute multi step task’ and generate comprehensive output’)

What exactly are you tweaking after eval in Agentex?

We (Scale AI) are tweaking whatever part of the system needs to be tweaked :wink: We can use traces to understand where things are going wrong, typically agentic workflows have multiple steps so it is good to see if it is the tool’s output or the LLM or maybe some other step that is going wrong. Once we have found where the issue is we can change hyperparameters, prompts, or even the MCP server/tools themselves!

So we feed the user input back to the agent context to improve its learning in some way?

The mechanism is quite simple. See code here. Feel free to use your favorite LLM to explore. The idea is to have a step after human input is used that uses an LLM to take out what is needed. Then, we (Scale AI) simply put this in a state variable and inject it into our Agent loop’s main System Prompt

How are you streaming tokens back from Temporal?

We (Scale AI) do this with Redis using a pretty basic pub/sub to the UI from our backend. Again feel free to explore open source server here and SDK here.

Do we get the agent code that we are showing here?

Yes it is all here.

How about the usage cost of LLM?

We at Scale AI sell our enterprise clients access to SGP our platform which does give a billing dashboard to understand LLM usage. Feel free to contact us here.

Are the learnings based on single answers in this demo? Is this from Scale or from the BYO logic? I can imagine one example isn’t necessarily generalisable.

It is mostly bring your own logic. We (Scale AI) supply examples and tutorials but we built Agentex by design to be non-opinionated to not box ourselves and allow it to be future proof. See tutorials here.

It would be great to have an inbuilt context management solution. With multimodal data becoming popular, the payload limit (2MB?) will be a bottleneck for end users.

Stay tuned for an upcoming announcement around large payload support in Temporal. Details to come, but we (Temporal) will soon have a private preview.

Any plans to have a standalone binary? Would be great for new users to become familiar with the functionality before switching to the full-fledged one.

This is a great question and something we (Scale AI) have been thinking about a lot. We want to create a binary version of Agentex to go from 0 to 1 in an easier way without needing to have all docker containers up and running. Stay tuned for more!

How do you model tools of your agents: Temporal Workflows?

We (Scale AI) actually use Temporal’s wonderful primitive constructs. Can see more here.

Does Scale AI also help against prompt injection (e.g. in documents that humans put in the loop)? Sounds like it’s not necessarily business logic (at least there’s a class of generic vectors) and could be lifted out of that.

We (Scale AI) do lots of red teaming efforts.

Do you have the demo code open-sourced?

We do; the open source server is here and the SDK is here.

We’re evaluating using Claude Code/Opencode as the agent orchestrator instead of building the agent loop ourselves. Any thoughts on providing Temporal’s durability and observability features while not integrating the agent loop directly as a Temporal workflow?

The durability challenge with Claude Code and Opencode as agent loop is they are locked to a single process, and do not persist events in a fine-grained, durable way that makes them easily recoverable.

Temporal is exploring ways to provide the power of file-based/process-locked agent orchestrators (like Claude Code) but without the durability drawbacks.

Talking about the backup of the temporal database. What do you think is the best, and safer, way to do it?

Assuming by Temporal database, you mean the persistance layer of the Temporal Service: For self-hosted deployments, normal database backup strategies apply (we support mysql, postgres, cassandra persistence layers). Or you can use Temporal Cloud, which we (Scale AI) manage for you.