LLM Agents in Theory and Practice - Nikolay Babulkov

Introduction

Glad to be here, glad to be speaking in front of you. So what is my lecture?

I will go through what are LLM agents, like in theory and practice, and it's an overview. I will not go in depth in any of it.

Understanding Agents

First of all, we'll talk about agents. What is their role?

We will talk about why LLMs need to be involved in all of this. We will give some examples.

We will go in depth in the components of an LLM agent. We will talk about the development of LLM agents.

And in the end, we will talk about some examples of autonomous agents.

Basic Concepts of Agents

So, let's start with what is an agent. So I like to explain agents in terms of reinforcement learning. So it's a concept where you have an environment and an agent, and they interact together.

You have the agent does actions, and it affects the environment. And the environment gives states and the reward to the agent.

So let's give an example. Let's say we have a game, and the agent is our character. He goes through the game, which is the environment, fights enemies, levels up, And that are the actions of the agent.

And what the environment gives back is the state. It's, for example, the level in which you are and all the enemies around you. And you have also a reward. A reward, for example, is your score or the level in which you're in.

So that's, I think, a basic example. I can give another one.

For example, autonomous cars. Autonomous cars are also an agent, and they are in the real world environment. They can interact with people. They can interact with streets.

And that is the power of reinforcement learning. the agent can interact with the real world environment, which other types of machine learning, like supervised learning and supervised learning, they don't do as well. So yeah, that's about agents.

The Role of LLMs in Agents

And why use LLMs? But for our purpose, what is an LLM? So that is why we need LLMs.

I'm not going to go in depth in LLMs, transformers, or any of that. I assume everybody has a basic concept of it now.

a machine that has input of text and output of text. And in that way, it can interact with every environment in terms of textual output, like humans, for example. And that is really powerful because you can you can give any explanation, any environment explanation to the model.

And also, actually, they are currently the best intelligence, like human-like intelligence that we know and we've done. So yeah, that is why LLMs should be in agent.

Evolution of Agents: From Assistants to LLMs

Let's take a step back and go through some old type of agents before LLMs and transformers were a thing. And do you know this type of, they were called assistants back then. Who have you used them on daily basis? Yeah, OK, not many of you.

So yeah, there is a limitation, because they didn't interact well with the environment. small machine learning models, rules, that were trying to understand a complex environment. And that's why they didn't work.

But recently, all of them are taking LLMs. Not recently, actually. Some of them pioneered transformers. But they are trying to get transformers into production.

So that was the example. old school thinking of Siri and Google Assistant, and they are not going to use it, even if it's good. But there are new types of agents. I'm talking about products right now.

You have Microsoft Copilot, you have GitHub Copilot, and OpenAI's assistance framework, and ChartGPT also right now. I guess it's an agent. So how many of you use them on a daily basis? Yeah, many more.

So yeah, that is why transformers and LLMs were included in the equation. And that is why this is the future.

Components of an LLM Agent

So after we established why we need transformers and LLMs in agents, let's talk about them in depth. So an LLM agent, it has

Memory in LLM Agents

three core modules that are memory, planning, and tools. So why do we need them? That is except the LLM. The LLM is the agent core.

Not only. That is a simplified version. So why do we need them?

So memory, as we know, is the way to improve on intelligence. Like, if we don't have memory, we won't be able to remember some kind of interaction we did the other day and it failed. And we wouldn't be able to know that if we didn't have memory.

And as in people, in agents, we have also short-term and long-term memory. What is the difference? In psychology, there are different definitions.

But here, short-term memory is what type of information can be included in one prompt, exactly one prompt. As we know, the prompt, it has limits. It has token limits.

And recently, there have been models that have over 100,000 tokens in their limit. And that's really cool. But there was one research paper that showed that all of that memory is a lot bigger than before, it's not as efficient as it was before.

long amount of memories, they lead to lack of attention in the model. So that's why we need long-term memory. Long-term memory is a way to address an infinite amount of information. And there are a lot of strategies, all of which can be seen in information retrieval textbooks from, I don't know, 10, 20 years ago.

Some of them are For example, standard information retrieval, you have vector information retrieval, you have graph databases. Everything that can store information can be a long-term memory for the model, as long as it can interact with it. So yeah, that's a long-term memory.

Tools and Actions in LLM Agents

Another thing is tools and actions. As you remember in the first slide, actions were the way to interact with external environment. And that's really crucial for these models because if we don't have actions, there will be no way for them to be practical.

So one way we can do this is with function calling. So function calling is a way to structure information in an object. you can give it to an API, you can give it to anything externally. So that is a way to interact with any kind of API or any kind of object in the external environment.

For example, you need to call I don't know some kind of information extraction and you need some kind of query you need for example boolean query function calling can do boolean queries you just have to explain it so why is this really good thing before there were function calling and similar tools you had to force the LLM to structure the output and that isn't as simple as it looks like because there are statistical models and you don't have certainty if the model is going to generate the right structure but function calling is a way to do this also we have a paper called two former in which

the authors did a very interesting thing. They managed to force the OEM to call an external function while it was generating, like thinking, like thinking and extracting some information and returning back to it to think about it and continue its generation. And that's really a cool thing because you can have uninterrupted generation of the model.

So that's one of the things that can be tools and function calling gives you an option to to do everything.

So one other example for tools is retrieval. And it's kind of similar to long-term memory, because it's like searching for information in a lot of memory. But you can do this inside of files. You can do this inside of any kind of textual information. And that is also a tool that the model can call itself.

And an interesting tool that was added recently to OpenAI and everybody else copied it, it's the interpreter. So what is an interpreter? You give away to the model to write code and execute it in real time. And why is that powerful? Because the models, they are statistical models, they can't force logical thinking. But if you force writing code, they can do logical thinking with the interpreter. So that's really a powerful thing.

And if you can think about it, the interpreter can do everything else. then you can write code to call an API. You can write code to retrieve some kind of file. And it's really an interesting tool. But there is a big caveat to it.

Every service that implemented it, there is very much a security concern because the interpreter, as you know, you can tell it to generate any kind of code. You can tell it to generate malicious code, for example. There is a way currently to list all the files in a machine in an open AI interpreter. Even though they did their best and they closed the machine and they did everything they could to stop it, you can do it. And that is an intended way for it to work. But it's an interesting tool.

So you can have any other tools, for example, that can interact with the world, but I'm not going to go in depth in any of it.

Planning Module in LLM Agents

The most interesting thing for me is actually the planning part of the agent. So the planning part is what differentiates just an LLM from kind of agent that can do complex tasks.

One of the things you can do with a planning module, you can break a task apart. You can decompose it to sub-goals.

For example, you want to go to buy milk from the supermarket. For us, it's normal. We know what to do. But an LMI agent doesn't know.

And what you can do is use such model to generate sub-goals, like get up, get out, go to the supermarket, buy the milk, return back, pay. And these are type of sub goals that you wouldn't be able to do with just one goal of the LLM.

Why? Because the LLM can't make the reasoning to do this in the first step.

So there are papers for this type of decomposition. One of it is Chain of Thought, which forces the model. It's actually really simple. You force the model to think step by step.

You say, do this, but think step by step before. And the model has to explain what it has to do. It has to explain in detail and then do it.

And there are proofs in the paper that this is much better than just a normal LLM call. And now every... Every type of agents and service do this there is also one other way and in one other paper that is called three of thought so this

takes the concept of the chain of thought paper and extends it to make a tree of subtasks. And the model can test each part of the tree. For each part of the tree, it can give one task and then, parallel to this, another task. And it can test if one task performs better than the other.

And that's also a really powerful way to solve problems. But obviously, it's computationally much more expensive.

There are other parts of the planning. You have reflection and refinement. These are also based on papers.

Actually, reflection is more of a self-reflection. It turns out that when you call an LLM and it returns some output and you call the same LLM to evaluate the previous output, it can perform better. It can find mistakes.

And why is that? Because actually, intuitively, you give the LLM more time to think. You give it more time to spend on this problem the second time, and it can find itself some problems.

So that is really a powerful way and you can change this and do it endlessly. You can generate, fix, generate, fix, generate, fix, and this can improve. But obviously there is a limit what the LLM knows. Like it's an improvement, but it's marginal one. And yeah, that's all about planning.

Developing LLM Agents

So let's go in depth with how to develop an agent.

So right now, It's not practical to go deep down and implement everything yourself. Obviously, if you want to release something fast, you have to use some kind of libraries to develop.

Frameworks for Developing LLM Agents

The obvious one here is OpenAI because they give very, very good basics. They give you obviously DLLMs. They have assistance API now. So the assistant API gives you basically everything you want to do an agent.

You can call APIs, you can retrieve information. And the only thing that it doesn't give you is autonomy of the agent, which is like repeating a lot of steps.

So yeah, but if you choose OpenAI, you're dependent on OpenAI only. And that's really not good. Because they can, you've seen like some being fired. So they can do something like this. They can change their API. They can do everything they want.

And that's not their main product. Yeah, you want to diversify a little bit. And that's where the other libraries come in.

You have Langchain. Langchain is the oldest and the most important library there. But one problem with them is they have a lot of old concepts because the world of LLM is super fast. And they have a lot of old concepts. And they have a lot of messy code. they have a bad architecture and they're not really good for production. Our company tried to use them in the beginning, but that was really not a pleasant experience.

And we actually tried to do it ourselves. We tried to implement almost everything in the launch chain ourselves, because actually that makes us independent.

So Other similar frameworks, like Lama Index, I actually like this one because it focuses on one thing, and it's search, and it does it well. So what it does, it gives an abstraction over vector databases. not only vector databases, any kind of search and any kind of indexing. It can not only do every major vector database, but it can search in graph databases as well, which is really powerful when you have very structured information that is really intertwined. And you can store this information in graph database, you can do queries there, and you can extract information for the model. Actually, that's what Microsoft does in their copilot. because they have a really huge graph database, and Google as well.

So another recent addition is Haystack. I'm not that, I don't know much about Haystack, but recently I've seen a lot of hype around this, because what they do is they try to do it production, like, the LLM pipeline, but in production. They think about how this is going to scale, how it's going to affect users. Why is it not that fast? Because the other frameworks, they don't think like this. And that's bad when you want to base a product over LLMs. Yeah, if somebody has information about Haystack, I would want to meet you afterwards.

So the last thing is semantic kernel, which is obviously Microsoft does what Microsoft does. They see something and they want to copy it to themselves and they do this. It actually is a kind of a copy of change terminology. Like they call a task, they call it a kernel, I think. An agent they call a kernel. And that's why it's semantic kernel. But it does the same thing as a long chain, for example.

Yeah, the environment is always changing in these kinds of frameworks. It's always become something new. And they always break the old versions. And you don't want to depend on many of them because you don't have stability, actually. And when you develop a product, you want kind of stability to have some basis.

Examples of Autonomous Agents

so yeah that that was about development and there are some interesting kind of proof of concept example of autonomous agents and what are actually autonomous agents these are agents that can go a long process of repeating the same steps, extracting from memory, reasoning, calling some tools to extract more information, and breaking down tasks to smaller tasks. And they can repeat this hypothetically infinitely until you run out of money, because they are very expensive to run.

And there are some proof of concept. Why I say proof of concept, because there isn't anything production ready here as well. As you imagine, that was the theme of the whole presentation.

For example, Alt-HPT is just a script and you can run it. I actually have an example of it. You can run it, for example, and it can edit your code multiple times, not just once.

For example, if you run ChartHPT, you can give it a kind of overarching task, like some architectural task, and is going to figure it out through the internet and going to upgrade it.

what is the minus here like why they are just proof of concepts because it turns out if you run an agent over and over again it diverges a lot from your task and you're not going to imagine like how it can diverge like it can go into wikipedia searching for some kind of algorithm and then go back and try to apply it in your code but that's going to be not the optimal way to do it.

So... All of these libraries, they are just proof of concepts. AutoGPT is a script. Baby AGI gives some kind of framework. And SuperAGI, they're kind of a recent addition. They have a service, actually, which I didn't test. But it looks like it's kind of similar to the others.

Demonstration of Autonomous Agents

And I have a small demo which actually didn't work. So I'm giving you a video of the same thing I wanted to do.

That's my inspiration. I don't know if you've seen. Yeah, you're actually not seeing that much. But what if this is actually, what you see is like, yeah.

You give it a task in your code. to read the content of this program and edit it in some way, add something, write tests, execute code. And you can see what it does. It plans, actually. It reads first the script, evaluates the code with an interpreter, evaluates the code, tests it if it runs, improves the code, write tests. And this can repeat over and over again. Evaluate the code. Does it run? Improve the code. And then again, write tests, do the test run, improve the code.

And this is the type of autonomous agent that does this. So yeah, that's not going to be, you're not going to hear this. But what it does is it edits the code. You can see it right here. I can send the example afterwards. And it's really interesting because you just give it the task, and it searches the internet, writes tests, does everything for you. And obviously, writing tests is a big thing.

Conclusion

Who's using Copilot here for writing tests? So yeah, that was the small, not that

not that beautiful demo and that's it for me if you want to take one thing from here that's All the frameworks are kind of immature for a product. It's very good for experimenting. It's really very interesting for experimenting.

But nothing is at product level. You can see it from big companies that are trying to do this. You can see it from Microsoft, from GitHub.

As you know, Copilot, I've mentioned it a couple of times because I'm a huge fan of Copilot. The first three months, it didn't work at all. It had to learn. It had to create memory of me, know my profile.

And after that, now it works like a charm. Maybe the GPT-4 was the plus there. But now that it knows me, it's very good. But first, it was unusable.

And most of these products are the same way. And the frameworks that implement them are similarly immature. And if you want to experiment, that's really cool. But yeah, trying to do a product with this is hard, no matter how it looks. So yeah, that's it from me.