Hi everyone, I'm Josh, and I'm going to be talking about working with multiple agents, so mostly trying to outline a problem that I think we've all faced, and then ways of thinking through the solution that is still emerging. For my work, I build generative AI products for clients and help people using agentic coding tools.
So the four main topics are thinking about the single thread or the single agent, parallel work as a solution to that, subagents as a second solution to that, and agent teams and other paradigms that are emerging as a sort of third solution.
So what do I mean by the single agent? We all get chat GPT, we get very excited about it, we want to look up something.
Here you can see I'm researching my friend David Beatty and we start chatting with it and we have this kind of experience, this kind of magical experience where an LLM, large language model is wrapped in this web interface, and we can talk with it, go back
and forth, and then get some output that we can then write in a paper or report or something like that. So this is the sort of, yeah, magic ChatGPT moment.
We already had language models before this, but having this kind of back and forth with it turned out to be really powerful, and obviously the models have got better since then.
As time went on, the companies introduced introduced more tools, so research, we've got all these like MCP connectors and things like this, ways that you can connect to other tools that you are already doing and the language model itself can call those tools, can interact with them for you and so some of the things
the language model is doing for you and some of the things the output is, some of the things you are generating the output for. So this is the sort of like single agent paradigm.
We've got this like extremely magical tool, this alien technology delivered to us but but we're under -leveraging it because it's new to all of us and we don't have experience with it.
1Basically, there's this observation that thinking done by our AI tools is not a scarce resource. I mean, it is in the sort of like broader electricity and GPU sense, but on an individual user basis, there's no reason you need to have one agent doing your work for you. So what does this look like?
Well, we basically got a bunch of copies of the LLM sitting on the shelf that we're not taking advantage of. of.
We're having this amazing experience with ChatGPT. All the while, we have virtually unlimited number of copies of ChatGPT that we're not using. How would you solve this?
The first obvious innovation is to think, well, I can do things in parallel. It takes a while for the LLM to run. It takes a while for these tools to run.
Maybe I start up a new browser and I chat with that LLM and go back and forth, forth, and a third browser, and as many as I want. And I'm still this fundamental person producing output, and I'm chatting with all of these LLMs.
This was transferring a pattern that we've already very comfortable with in modern world, which is tabs, having millions of tabs.
And so it was a very natural, I think, innovation to just say, I'm going to run ChatGPT in multiple tabs. Or if you're a developer, running running Clod code in multiple tabs, multi -cloding, as it's called.
Oh, I think I disconnected it.
And in the developer world, this is very, very exciting when all these developers are wanting to produce lots of code, and Boris, who created Clod code, so Clod code, if you're not a technical, is probably the best coding tool that's like ChatGPT, but for developers to write code.
And you can see Boris, who invented Cloud Code, talking on Twitter about running 5 to 10 Cloud Codes remotely, as well as some local Cloud Codes in parallel. And part of this is because various workflows, you're trying to figure out how to do even longer running tasks with Cloud Code.
So, yeah, I have some of these diagrams from two different Anthropic posts, one from the end of 2024 and one from early this year. where we're moving from this observation we can run things in parallel where before it was you have this was to do with building machine learning systems with using language models that you could run these LLM calls in parallel and now we're thinking in terms of building products where
it's not just an LLM call but an agent where an agent in this definition is an LLM running in a loop with tool calls so it can interact with the system and it can decide at every moment do I answer the user or do I make another tool call? So that's the parallel idea.
Now sub -agents.
Well, the models got much longer context windows, you know, ballooning from tens of thousands to millions of tokens and the problems we present them grew. And as you increase the amount of content that you dump into one of these models, the performance degrades and its behavior shifts from and what's expected.
So subagents is an innovation to think about working with managing your agent's context, as well as the tools it has access to.
So for the user, this looks like still the same sort of fundamental approach with talking to this main agent. But this main agent can spawn a subagent to do some set of tasks.
And why would this be useful?
This might be that the agent needs to read 50 web pages. And instead of filling the main agent's context with those 50 web pages, you spin up a research subagent.
It reads all 50 web pages and delivers back the three that were actually relevant for the context. So you probably have seen this if you've done some deep research with these AI tools.
In the coding world, code bases are massive. And Anthropic have introduced explore agents, which independently can be spawned by the main agent.
They'll explore the code base, read a bunch of files into context, and then give condensed information back back to the main agent, protecting the main agent's context window.
Yeah, I'll skip this, or I'll briefly talk about this.
But in this, they had to find some built -in subagents. And you can make custom agents as well. The built -in ones, the model they have is Haiku. So it's fast. It's cheaper to run, because they're just trying to do some information retrieval and discovery.
The paradigm here is the main agent spawned subagents. That subagent does some completely independent work with no human input and then feeds the result back to the main agent and then you're still having the same fundamental experience and as a user it's excellent because you just get a better performance without having to do anything.
In the development world this means that you can craft this context that you want and the ways that it's working if you want to say I want to load in these skills always when you're working on this problem but I don't need my main agent thinking about it Or if you want to make sure the subagent is going to read only from the database and not do some writing and you Wanted to let it kind of just go do its thing knowing that every command it runs will have to be read -only
Okay, so that's subagents now agent teams for agent teams Again, it's the same sort of idea of these other agents that are spawned by the main agent
But instead of them being completely isolated from the user You can talk with these teammates as well because the task that they are doing is requiring some sort of human input. And the common information they share is a shared task list.
So the main agent will spawn the teammates, it will define the task list, but everyone's talking with everyone else. It's not a main agent who's kind of controlling everything.
You can step in and chat with the teammate for a bit or the main agent. This is the anthropic diagram.
So the loop that you have is these These teammates are doing work, which you can interrupt and you can talk with. These are almost like individual ChatGPT instances. And then this is a different ChatGPT instance for all intents and purposes.
And they have a shared document that they're referencing that they can all contribute to. Does it use a hub? Yeah. So I'll show you.
I've got a screenshot here. This is a little bit, if you're not technical, maybe crazy.
But the idea here is with Cloud Code, you can spin up an agent and the way they do it is they split it with this main agent on the left. So this is like a ChatGPT instance. And then the sub -agent teams are here.
This is a party architect team. I was making, for my other friend Fuyu, some party design. So party architect here and a PDF developer here.
This main agent has prompted these sub -agents. But if you want to, you can step in and pause one of them, make them full screen, and say, oh, I don't want you to do it this way.
I want you to do this other way. all the while the main agent is intermittently checking in. So pause, I want you to focus on this other direction. You can step in to these teammates and have them go in different directions.
This is complicated. No, they have directions to set it up. You do just need to enable it. It's in the sort of like beta testing stage.
On the non -technical side, you'll have seen things like teams with, not that you can interact with the teammates, but with deep research. So, if I want to develop deep research about David Beatty, you can see they've kicked off this lead researcher.
So, this triggers really quickly that they go through the different loading states, but this lead researcher will have made a plan and then again, you can see it's gathered 192 sources and counting.
If you dump all of that into an LLM and say, give me an answer, who knows what it's going to come up with? But basically, these subtasks can get split off and the agent team can work together.
whether they've got a lead agent, this is their diagram, lead agent orchestrator that can work with these. This is a bit more like sub -agents, kind of like the lines blur between sub -agents and agent teams.
But the problem that everyone's trying to solve, of course, is how do you take advantage of the LLMs working within the constraints that you want to match the context to the problem that you're solving.
And I'm going to briefly just mention this is Gastown, invented in January, so I'm not going to talk about it in detail, but this is like people trying to solve this problem in any way they can, like what if you have this mayor that can delegate tasks and they check in on this and that.
This one's actually gaining a lot of popularity, but I think this is one of the trends you'll see coming this year.
Other paradigms of, I'll just mention this, generator evaluator.
So if any of you have done code review agents, you can have code review agents that evaluate what the coding agent has done and then have the coding agent respond to the code reviewer and the code reviewer respond back
and forth and I'm probably an amazing quote here which is the key insight is that generation and evaluation are different cognitive tasks so again you're taking advantage that in some ways it seems odd to say why would an LLM evaluate its own work basically but if you get it in the headspace of write
this paper and then a different one in a headspace of evaluate this paper that was generated those are different kinds of tasks and you can get more from the the LLM.
Sequential, again, this is the 2024 to 2026 shift.
So some quick tips and then questions.
1So for structuring agentic work, match the context tooling and model to the task. So as much clarity as you can get on the task that you're doing, I love the example
from the lovable demo thinking about maybe getting chat GPT. In my mind, that was like a subagent process or a teammate process where you're working in lovable, but you need to
refine a prompt, so you're spawning up a ChatGPT subagent to workshop the prompt so that the lovable demo only has the condensed and refined version. So it actually will do a better job than if you just workshop the prompt within lovable.
So in that way, in my mind, that's like subagents but just not native to lovable yet, but they should build that in. Look for distinct kinds of cognitive tasks, so prompt refinement, research, development, development, evaluation, and not try to have the LLM do everything all in one.
If you're going to design a prototype as well as identify the problem, leverage the fact that agents can steer other agents. And you don't have to make every decision. They're very smart.
Translate existing patterns to work with agents. So existing patterns with LLMs, with how teams work. A lot of the patterns are translating. Some aren't.
And I would say look for simple structures. It's easy to get crazy. And I feel like a couple of years ago when people were talking like crew AI and there's, I mean, even Gastown feels that way to me, but it's easy to come up with some crazy structure where you have an org replicated with LLMs, but generally the really simple structures are working.
And I feel like Anthropic's pioneering a lot with demonstrating what you can do with really simple structures. And that's it. Thank you very much. Any questions?
At what point would you say an agent team versus just the... Yeah. So like thinking about, yeah, sub -agents versus like a...
I think I'm not I don't really know I think mostly sub -agents make a lot more sense I feel like yeah I would say mostly I think sub -agents make sense I think if you have a lot of clear distinct yeah I think this one is worth exploring but I wouldn't consider it like the main workhorse right now I think partly because the ideas are still half -baked
I think even the cloud like beta the team demonstration that I mentioned for my own personal use I actually don't use it that much like I use just sub agents I I broke my brain a little bit when I first saw this and just trying to think how would I even you know? get them the best use of this, but I think I
lean much more towards using tabs with sub agents and then Spinning up effectively teammates, but I don't use this structure
So a lot of people for years now have been making like markdown files that kind of function as a common task list that multiple teammates can work with. But, yeah, this, yeah, so, yeah, I don't know if that's a good answer to that or not. Yes?
How do you manage and identify them? Yeah, so, I mean, if you have a lot of money, then it's always the smartest model, I guess. Yes, but generally a lot of the model providers have a high, medium, low tier, and the lowest tier is going to be the fastest and also the least accurate. So if it's a task where you would want to maximize speed and you think the task is somewhat easy.
So I think in the Explorer situation, if you're just trying to file the file names where this is referenced, a very cheap, fast model will find it for you. For their plan and general purpose agents, those inherit the model from the main agent. So if you're using Sonnet, it will be Sonnet, or Opus, it will be Opus. But generally speaking, a smarter model is better.
The reason you'd want the cheaper model, if you're optimizing for speed or the kind of task is not writing a paper, it's some sort of information retrieval that you think the cheaper models will do well on.
You can, by the way, create a custom explore agent with Opus if you want. So if you're like, no, no, I don't want to haiku explore my code base. I want to use the smartest model. You can do that.
Yes. Thanks so much.
I was wondering, is there a kind of, are all the sub -agents currently kind of embedded in platform? Yeah, you can do that. Are you talking about with encoding?
Yeah, so I think if you're looking for open source projects, Gastown is probably where you're running a lot of things locally. And I would explore Gastown. when it comes to running sub -agents in development, there's a lot you can do with custom sub -agents.
So that was these files here, where you can define in markdown files the kind of agent that can be spawned. So the main agent knows that it can spawn an API developer
and it has this information. And then once it triggers and spawns one of those agents and gives it a task, there's a long prompt here that instructs that sub -agent
its instructions. So implement the API endpoints, et cetera, et cetera. or read those skills in. So Cloud Code does a lot of that automatically,
and all you need is just a markdown file to create the skills. I mean, I think Cloud Code's incredible for this. Yes?
Yeah, I mean, I feel like this will be the talk from a year's time of what we'll make. Like, this will be the pattern. Sorry, I'm...
So, yeah, the 2024, thinking about LLMs and systems. Oh, you could have an LLM generate code code and then another LLM evaluate it, now we're talking about that LLM in a loop running tool calls generating and another LLM in a loop evaluating.
It does feel like it's getting more complex and I think once all the craziness is settled, there will be some really cool structures that will translate some of the things we know about how humans work together and then some of the ways in which LLMs are idiosyncratic and they're not just a designer or a developer, they are a generator and an evaluator.
they they have more like distinct uh they work better when you refine the the cognitive space that they're working in but yeah i don't i don't know yeah i'm expecting a year's time we'll be talking about agent uh swarms cities and etc but we'll see thank you thank you