MCP revealed: How does MCP transform an LLM into an agent?

Introduction

Hi, as you've heard, my name is Ben, Ben Morse. Thanks for that talk over there. I think Tony's back over there getting some hummus. He's back over there talking over there.

I've seen things like online coding tests where they try to figure out whether you're looking at the screen or looking off at your notes or something else. There must be a way to do this that people claim they have.

Like, you know, when you leave code tests, you're actually looking at a different screen somewhere else or down at some notes. Like, what is JavaScript again?

So maybe it's already a thing that people have in the wild. Maybe people just claim this. I don't know. That was really interesting.

Setting the stage: what this talk is (and isn’t)

So I'm talking about a whole different topic over here. I don't train models for a living. This is going to be an MCP, the miracle of how alums become agents.

I've been here before, but I'm not quite sure what everyone's backgrounds are like. It seems like people are pretty sophisticated over here. Like who here like trains models? Like a lot of you train, not a lot of you do, I'm not as scared as I was before about

being an inferior like model training kind of person. How many of you have heard of MCP before? before? Many of you, maybe half.

How many people have used MCP? A lot of you have. How many people have made their own MCP servers before?

Okay, so maybe some of you people know this stuff already. This is not the most advanced content in the world. If you know this stuff already, please help me out.

Please make comments as we go because I'm going to be going into this in some detail and it might be too easy, might be too hard. I don't know. We'll find out.

About the speaker

But first of all, this is me. My name is Ben Morse. I'm the developer evangelist at a place called DeepL

and I've been sick on and off for like the last month or so after I was traveling and I had like a small virus I got the plane and it got a lot worse when I got off so you hear me a coughing horribly during this talk I'm not gonna probably die I debate my cough horribly for a few seconds and I'll I'll survive I'll cough and this way make viruses fly over there and I had any of you so

What DeepL does

this is me I'm Ben this is not me this is a the corner dome this is a big Cathedral in Cologne Germany which is the city where DeepL is based.

So DeepL has about a thousand employees. It's a European AI company traditionally involved in language translation.

We make a very popular app popular in places like Germany and France, continental Europe, and Japan also in China that is translation and web app and website do the same kind of stuff.

We now are benching into other kinds of things too because we have a research team so we're also making our first all -purpose agent but it's more known in Europe and Japan than in America those folks like me

who live in New Jersey not in this very glamorous place have been hired to just kind of be on the ground and work for DeepL work on the API and say hi we actually exist for somebody try our stuff it can be glamorous right now it's full of beautiful snow and snow is glamorous you know and the Garden State Parkway is glamorous because it's a big highway way.

DeepL translation demo: tone and rephrasing controls

So, oh yeah, this is like the, this is a little demo here of the DeepL translation app. I'm going to just like do this over here.

So for example, if English, you want to talk about friends, like French, like, hey, I'm at a meetup. There you go. There's nice translations over there.

And you can change these things. You can make them more formal if you want to. If you speak speak French, bonjour, je suis actuellement à une réunion, which is more formal than

like, salut, je suis à une rencontre, you can change, customize, you can prompt the model, you can give the model sentences to like change the way it does things, you can also modify text, so for example, if we're going to take something like English and say, hey, wow, it's a meetup, there it is, we also could make this more business -like, for example,

Greetings, I'm impressed it doesn't meet up. You also can go through and change these things. You can rephrase certain words and get more ideas over here.

We can make this academic, which is gonna be even more words probably. Now there's fewer words actually. Casual, business, let's try academic one more time.

No, it's not working on academic today. So academic has gotten more casual than it used to be. Anyway, that's the idea of what we do.

DeepL’s new general-purpose agent

And then there's this agent over here, which we have just started to let out there. It's like a general task -accomplishing agent. It has workflows and tasks, other things.

You give it things to do. It finds a way to do these things. It's got access to some browser stuff, some virtual Chromium instances.

It's got access to tools, that kind of stuff. So that's DeepL in a nutshell.

And there's that again. And then let's go on.

Why MCP (and why it’s catching on)

We also have, this is called a transition in the business, MCP server that another guy and I made because we were doing a hackathon in San Francisco last May. and I thought we're going to go to a hackathon.

These people are all like AI hipsters, AI agents. How can we fit in with these cool people? We heard of MCP, we made this MCP server. I got there and it was okay, you know, it was fine. We all hung out and we were all fine.

I got really fascinated by the topic of MCP because I think it's kind of cool because it's a way where, you know, like LLMs that are kind of like stochastic and kind of unpredictable can become more predictable.

You can say, here are some software tools, and do these kinds of things. And it's not totally predictable, but it's more than just like a free -for -all. So I found the idea very interesting.

It's kind of like a combination of old -fashioned programming ideas and new -fashioned programming ideas. And plus, MCP's eating the world, and it's here to stay. So we're gonna talk about what it is today.

If you wanna look at this deck at any point, I have some nice little short links. I have DeepL MCP Deck Philly for, yes, the deck. mcpcode over here is the code which I'll be showing you a little bit of.

I made a simpler version of dpels on mcp server. I also have made a jokes server which is even simpler than those things.

Both these things are in Python and JavaScript. I made them as short as possible, as simple as possible. If you

want to try these things on your own, it's a good way to start by using something very simple and straightforward. And I also wrote a post about how mcp works called dpel mcp post.

I'm gonna begin coughing now so I'm gonna have some some water to stop coughing so much. There goes that, okay, that's a little better. One cough just for good measure. You've been warned.

Roadmap for the session

Anyway, we have four sections to this over here. We don't wanna take too long.

First of all, what MCP actually is, how it actually works, why it exists,

using it as a client, or your own AI client, writing your own server, and the latest developments because it's changing all the time, MCP.

So let us go over here.

What MCP is: definition, standards, and inspiration

first of all what is MCP? Who knows the answer to this question? Some of you've used it before.

What is MCP? I bet Tony knows. Model Context Protocol. This is

true. What is it more specifically? It is a new standard from Anthropic. It's a way

to give your favorite AI clients access to tools, resources, prompts, and things like that.

It also is inspired by a thing, well first of all it uses a thing called called JSON -RPC. Do you guys know what that is? Anybody, JSON -RPC, yes?

What is JSON -RPC? Remote Procedure Call with JSON. Yes. Token. Yeah, essentially you have calls that go back and forth that are embedded in JSON. It's a nice standard. I took this because it already exists and it's pretty useful.

It's inspired in part by a thing called Language Server Protocol. Who has seen this before? This is a little more obscure. Maybe this is something that will be new for us. LSP.

So, when Anthropic did this, they were not the first people in the market, obviously. ChatGPT was already pretty well established. It's still the most popular LLM out there and the most popular consumer LLM out there. And they already had things called connectors that were kinds of tools. Anthropic being the second or third or fourth mover in the market, decided to make an open source thing. And the genius part about this was that anybody could use it.

So language server protocol is a thing that Microsoft invented when working on VS Code. If you're using your IDE and you're writing in JavaScript or TypeScript or Python or PHP or Prolog, whatever it is, the syntax highlighting, there's a support for your language and various things that can do to make the language easier to use. The problem was for every IDE, you were trying to invent this for every single language. You have like M IDEs and languages, you have M times N implementations. Very inefficient, a lot of work.

For VS Code, what they did was they made something called language server protocol, which says essentially was a thing where if a language implemented this, an IDE implemented this, any IDE could use any language. Instead of like an M times N problem, you've got basically an M plus N problem. So it's more efficient.

And similarly, MCP is a thing which sits between AI clients and these servers. So it's more efficient, everyone can use it, it's not hard to actually implement, and it made things more universal.

I think that's why it caught on so fast, and why Microsoft began supporting MCP for their apps, because everyone began using it. Google has MCP for the web now, WebMCP, which is a fascinating idea we could talk about if you want to. So I think it's why it caught on so fast. It was pretty easy to get started with, and it's universal, anybody could use it. And even now, OpenAI has been using it too. So I think it's successful in those senses.

So again, the whole point over here is like, how can your LLM actually use tools? How does it have access to tools and resources because it's an LLM? It's just a thing which takes tokens in, and spits out tokens. So how does that even work in the first place? We will see. That was a mystery for me at least.

Right, you said this before, what is MZP? It's Model Context Protocol. What does the model mean? When we say model, what is a model probably? Statistical model. You're very close. It's even more obvious than that in this room over here. It's an LLM, a language large, it's a language model of some sort.

Context. What would context mean, do you think? Given our context here in this room, what would context be? Like the model's context window? Sorry. Actually, you're exactly right. Yeah.

Use prompts to give the model the context it needs to be able to use tools, resources, and so on. And a protocol is just a protocol. So it's a protocol that models can use to get context that lets them use tools and resources and other fun things out there in the world.

So what is a client? Anything which embeds an LLM. It needn't be an LLM but it usually is.

You know, it could be VS code, it could be cursor, it could be Claude, it could be any of these things over here, Gemini, CLI. These things can all use MCP.

You also can use it within Langchain or other programming frameworks like this. Today we're going to see it in Claude desktop because Claude is just, I think, kind of fun to use and Anthopic invented this. It's fun to do it in Claude.

Why not just call APIs directly?

But why Why do this? Why try to give your LLM access to tools and services via an MCP? Because they could use APIs, couldn't they?

For example, at this point over here, I asked some LLM to write code for the DeepL API, and it could do it. It wrote all this code.

It knows the API already from reading our documentation online and seeing code samples. So why not just give LLMs access to the APIs out there?

You could do that. But why is that maybe not desirable? Well, I mean, there's thousands of APIs, and the whole rationale of MCP is it's supposed to be interoperable, but the challenge is that means the identity management is not

secure. It can be broken, too. Yeah. You said cost expensive? Yeah. All those things I think are true.

And also, APIs can be huge. Have you seen the Google Sheets API, for example? They can be giant. I think you'd have like this mass chaos

because you could use any endpoint you wanted to, however you wanted to. They could do whatever they wanted to. LLMs could go crazy, and they are doing this already in some ways.

MCP is nice because you can say, okay, look, here's like a server that has like these five tools. You can like, maybe for a spreadsheet,

you can add like, you know, a cell over here, change the cells to red or blue. You can't delete columns. You can't, you can add conditional formatting.

You know, here's your restriction of things you can do. That way the LLM has some clues about what you want it to do.

It was computation, the number of possible APIs. Yeah, that's a problem, too, actually. And the more tools you have, the more context you need, as we'll see in a minute. And yeah, the number of APIs out there is huge.

It would be a lot of stuff for LLMs to even be able to reason about, like, what do I use now? Actually, it's true even now if you use MCP, if you give it more than, like, five or six servers

or more than some number of tools, it doesn't know what to do anymore because it has so many options. So that's why you have subagents

and other kinds of schemes for doing more complicated kinds of tasks.

MCP in action: a live walkthrough in Claude Desktop

Do you want to see this live for a minute? Maybe it makes more sense just to see how it works. That's why I have this chair over here. Because I can show you

if I can get out of this for a minute. I just, that was a mistake. Let's stay in that because, well, all right, let's go over to Claude.

Here's our friend Claude over here. So they've changed the interface actually in the last couple of days. So I've got to find it again.

Here's my friend Claude. And they have now called these things connectors. and let's kind of load for a minute. Oh, there we go, good, okay.

I have over here various MCPs I've installed over here. Many are variations on DeepL's MCP. This actually is DeepL's MCP for translation where you can get languages, translate text,

other kinds of things. I have a version in MPM. I've got a simpler version. I've also got file system MCP and access my file system.

And there you notice it's kind of nice that we can actually easily turn on or off various tools here. We can turn on or off things like writing files,

for example. I've got a little hand over there, I guess the new interface. I don't even know what that means.

Maybe it means ask me first. It's not forbidden. So you can have control of the individual tools in most LLMs like this.

You can choose what you want it to do. I've got this. Is that part of the protocol? It's just what Claude does.

It's not part of the protocol. But yeah, the tools are all individually like individually told to the LLM through a prompt and you can then,

and you could then iterate this kind of thing where you turn things on or off. I've got this PDF thing that doesn't actually work. Sometimes you get WTP servers from online,

just from someone's GitHub, and they don't work at all. So that's a problem sometimes, so be careful out there. Somebody's saying security's a problem, it's also true. Be careful when you're doing these things.

Let's actually just look at this. So I think right now I've got DeepL going, and I've got a joke server going over here. So if I return here to, like out of customize,

I could say, I have this joke server installed. So all this does is call a joke's API. We'll actually build this in a minute. It calls a joke's API, which is very simple,

and it tells you a joke. You can ask for a joke by number or by type. I think there's programmer jokes, for example. So we could say.

How do you tell it, the old joke? Well, you'll see how this works now. It's gonna be hilarious. Tell me a programmer joke.

So of course, Cloud knows these jokes already by itself. but it may also, if we're lucky, call the server. Can you see that? Is that too small?

I could expand this because it's actually running Electron, it's just a browser, so we can actually just use pluses or minuses to expand it. So if I say, tell me a programmer joke,

it is now trained pretty well to actually probably call the tool. And it should actually call this jokes tool, I think it's like jokes by type, type of programmer.

Let's see what it does, if we're lucky it will work. There it is, okay, get joke by type. running programmers like nature, too many bugs. They're not actually funny jokes, but they are jokes.

And you can see what it did over here, that getJokedByType, it doesn't show you the parameters, but it shows you that it actually worked. So I also could ask for jokes by ID.

I could say, for example, can you tell me joke number, let's give it a terrible number. There are a few jokes in that in the actual database that should fail.

we'll see why this works in a minute it's out of range so it knows that because I told it that in the server let's tell it a better one over here yeah random was a good idea tell me a random joke and then I've got a tool for random jokes which was now calling why don't skeletons fight each other so true so true now let's make this joke even worse by telling it to translate into French

French. So if we do this, if we're lucky, we'll actually tell a random joke and then call my translation tool, DeepL translation, to translate it into French. So now I'll have to chain two

tools together to do this. And this will probably work. Can you tell me a random joke? It'd be a

different joke this time, probably, although not necessarily, in French. And we'll probably know that this means call two tools two tools it's a tongue twister now there's get joke and there's translate text so it's actually doing this is calling two

tools. It actually might make sense in French usually it ruins all the puns and then it says oh yeah it still works in French or Spanish because it doesn't doesn't really know that it doesn't work anyway.

It's trying to be generous. It's hallucinating or something. The puns don't work on the languages. There you go.

You also could do things like tell me a random joke in French and put it into a spreadsheet that would involve calling three tools. I usually does that just fine.

So that's basically MCP in a small nutshell. Now I'll show actually why that works or how that works. We're going to return to the slideshow, presenter view.

Yeah, yes. Yes, it's returned to the server. There's actually an API we're gonna call. Yeah.

Yeah, and then it calls a second API. So what it's doing over here, exactly. So it calls those jokes REST API, and it gets the joke, returns that to the, you know, to the cloud,

and then it sends it back out to the translation tool. Okay. And it gets it back again. That was the impact.

Yeah. So is that two MCP servers, or is that just two APIs? That's actually two different servers it's calling there. there so the the servers themselves are using api's they don't have to do that though but in

this case they do use api's so where is it running yeah so we'll look at this just about now in a minute but what it's actually doing here um is that the mcp protocol allows the allows the lm them to put out special tokens or special kinds of like, you know, often XML or those kinds of things that say, you know, call a tool.

And then actually, Cloud will intercept that. And then instead of putting it out to you, it'll send it off to the right tool and then get the output back. We'll look at this in a minute in more detail.

So yeah, that's actually the crux of it. Like, what is going on here? Because it seems like magic if you're me, but how does it actually work?

How tool calling works under the hood

So let's look at how it actually works. This is the part that baffled me as well, in fact.

Oh yeah, I said this before, you can turn tools on and off.

Anyway, how does it use tools? This is the part you were asking about just a minute ago.

Toolformer and “function-call-like” tokens

So it turns out there was this paper I found called Toolformer from 2023, I think, like the Stone Age for LLMs.

What they did here is they took a model and they fine -tuned it with different kinds of data. When there were things that might not be obvious to the average LLM, maybe not in the training data,

they would, instead of putting the actual stuff in the text, they'd put in this little quasi function call. So for example, instead of like

the messages medical society, this little bracket QA, then parentheses, then a parameter like question, and then an arrow, and then the answer.

So they train it with data like that. And also like over here, because of course you can't do any arbitrary arithmetic in an LLM, so over here they had a calculator thing.

thing. So 400 or bracket calculator and then like quasi argument over here and then error than the answer and then there's the actual output past the test.

Does that make sense? It's kind of like they trained it with actual quasi function calls along with the actual data itself. And the thing

actually learned to output this stuff. So learn to output stuff like this QA instead of this over over here and then you could take that and you could replace the actual function calls with the

actual answer and did this pretty well so that's actually how this works on behind the scenes which i was very surprised to see that this is like an old -fashioned way of taking these uh quasi function calls and there's like these special characters that are used we can do this live if

you want to you actually can program it to do this if you're lucky let's say we're lucky over here So we're gonna try this thing over here. We're gonna just tell you you've access to this tool. You can use this tool, use the calculation tool,

output this XML -like thing, calculator, then an expression, close calculator. So this doesn't actually exist. We'll tell it that it does exist,

and we'll actually hopefully actually output this stuff instead of calculating things. So I'm gonna copy this because otherwise we'd be spending here for a long time.

I have this written out over here in the bottom. By the way, it doesn't always work. Sometimes Cloud won't do this. let's see if it does though I'm gonna put the lines back in here so they will

tell it has this tool it's a calculator tool it's being very cooperative today it's saying it knows how to use it which is very nice and as I've used this more and more it's actually gotten better and better about using tools it didn't always use tools when it had a choice not to that almost always does so if we

say for example I don't know let's give it like a word problem let's make it a a little more complicated. If I have eight baskets,

and each basket has nine eggs, how many eggs do I have? So it should know that to actually answer this question, it has to actually call the calculator tool.

That's the process to figure out the problem is eight times nine, and then actually call the tool eight times nine. And hopefully it's gonna actually work.

If we're lucky here, there it is. Of course, actually it knows the answer because it's Claude, so it just tells us the answer. Well, in this case, it is doing that, actually.

So it's not really doing this. I just made this up. So you're actually absolutely right. Yeah, it doesn't have a tool like this at all.

I can give it this tool, but this is not how it actually, this is not the real MCP. This is an emulation of how it actually works. And actually, it's pretty smart about it.

Usually, if we say, like, what is, I don't know, 15 factorial, it might actually, sometimes it will make that a multiplication problem.

Oh what is this? That's weird. I confused it. But it's in italics.

Are the italics actually, oh wait, I think it's actually formatting like a bunch of time signs. Would a star and something else be like some sort of weird markdown that it's doing or something?

Yeah I think it's taking this and making it into markdown. Maybe? That's weird.

Anyway, that I never saw before. the format I don't know usually this actually works pretty well but this I've never seen before it must be now using markdown or something else like markdown anyway you get the idea it's not how it works exactly but this kind

of even even this kind of still works more or less I can show you this in a little more detail here back in our presenter view so actually finally I

The actual tool prompt format (Claude’s approach)

published this. It wasn't published for a long time. I had to guess this myself.

But in fact, it actually publishes eventually that when you actually use tools in Claude, it actually gives it this prompt over here. It says you access these tools over here.

Here's the formatting instructions. Some more instructions about how to simplify parameters. Then tool definitions and a JSON schema.

There's a system prompt. 1So it actually, they told us finally, yes, we do actually put all this in a gigantic prompt. This is how it works. I was trying to tell you before. It's a little complicated, but this is all it is.

So when you, once you use a tool, it outputs some sort of special format, then the AI client intercepts that format, it calls the tool, then returns that to the LLM, which then gives it back to you if it wants to. It doesn't have to do that, it can also process it some more, and then give it back to you later on.

Does that make sense? If you try it out yourself, it makes more sense. When you actually look at it and try these things, it becomes more obvious, like many things with time. Individual pieces of it makes sense. Yeah, there's a lot of stuff there, but it's kind of weird that it works at all.

But then it's kind of like something like, what is it, I'm forgetting what it is now because I'm kind of sick, like Ajax was, for example, like this weird line you'd use from your browser to call your server. And now every web app depends on this HTTP request thing, that at first it was like a little strange, bizarre mechanism, and now we have Gmail, and we have Slack, and all these things. Anyways, that's MCP.

Using MCP as a client (installing and configuring servers)

Let's go a little more into how you actually use it.

So this is a diagram I had AI make because we wanted a diagram. It's not quite right, but it's kind of cute maybe.

Local vs. remote servers (and the security tradeoffs)

One of the weird things about this, actually, if we're using it as a client, it's usually the client and the server are co -located. So we call it a server, but normally you don't have a server at all.

If I publish an MCP server, for example DeepL, we only have code. We have no actual server. People will take our code, stick it on their machines, and they'll run it locally. So usually the server is co -located with the client.

So here's this diagram. The server and the client are on the same side, users over there.

You also can have servers working over HTTP and remote servers, but often the client is the server.

So in this case what I've done with all these MCP servers is either written the code myself or downloaded the code onto my machine and then it runs the code on my machine so the security concerns are extreme but you know that's life

Client configuration: commands, args, and environment variables

exactly so all you have to do it's always the same format wherever you go whether it's vs code or anthropics clod or somewhere else there'd be somewhere an mcp servers object and you give it the names of tools like this or sorry so you give the names of servers you give it the commands you have to use to run the server, then it goes and does it.

So the first thing here is just the command itself, like for example in this case, node, for the JavaScript version of this, the args, whatever follows node. So if the command to run the server is node, blah, blah, blah, blah, blah, jokes .js, you have node as the command, and the args as an array of the arguments.

So all you have to do, and make it a little easier now to put these things in in most AI clients, but still, knowing this and being able with a copy and paste of this JSON is pretty useful.

As it gets more complicated, you have more arguments, like in this case over here, they all go over here into an array. If you have environment variables you need to put in there, there's an env object for that over there, an env property that is, rather, for those kinds of things.

And if you have a lot of arguments, you have a long, long array like over here. My Python version, I use uv. I have to have a directory of this and run fastmcp, run jokes .py, those are all arguments. You look at my actual file here, actually you can see it if you want to.

Let's be confusing and go to VS Code here right away. You can see these things, if it was bigger maybe, if I make this go away here. That is my file using Claude. You see the actual JSON I just showed you, sitting right over here. And when I want to add new servers, I go and I modify this file.

You can also often modify this being a UI these days, But knowing this is still pretty useful and that's basically how you use as a client and there's an example You already saw this kind of stuff happening. This is an old interface now Any questions on that?

Basically, if you want to use mcp server on your local machine you download someone's code you run it and just pray for the best You also can use MCPs like in touch EPT and though and their web version for example that requires of course using a remote mcp server because you can't download the code onto openai servers because you can't do that

Building your own MCP server

all right so now let's build a server real quick this act can be kind of complicated but it's gotten much easier because there's a library that makes it much easier um so you need a client to do this with you look at this deck later on there's many clients you can use that will support mcp

and uh coding language there's sdks in six different languages javascript is how we did our own DeepL MCP server. It turns out that was a bad idea.

Python is much easier because Python has a thing called FastMCP, which is not philanthropic, but it makes all these things much easier. It basically makes a basic server with default options very simple to write,

as we'll see here in a moment or two.

MCP primitives: tools, resources, and prompts

So also MCP supports things called tools, resources, and prompts.

I've only discussed tools so far because I've found no use so far for resources I've tried resources and prompts and haven't found them useful. I didn't find the use quite yet.

I heard on a podcast that two guys who wrote this protocol said, please try these things out, try resources, try prompts, they're great.

But actually MCP apps do use resources. If you have time, we'll look at that. And here's how you do it.

A simple implementation recipe (four steps)

Building an MCP server, there's four basic steps. One is instantiate the MCP server. No surprise there.

Number two, for each tool you want to create, There'll be a function that runs the tool. So that functionality will be contained in a function. No surprise there either.

Number three, some configuration is required for the server to understand what, sorry, for the AI client to understand what the tools do.

Number four, use the transport and you start it off and there it is. Let's do it.

Example: a jokes server using FastMCP (Python)

So I use this jokes API because it's very simple. Other APIs work just fine too, but this is simple because it requires no API key and it's not hard to use.

So the first step over here is to enter your MCP server. If you use FastMCP, it's pretty easy. You just say, blah, blah, blah, equals FastMCP, name your server, and then you're all done. Step one, one line of code, already done.

So you can do this yourself like tonight, or even right now if you want to. Again, JavaScript is more complicated in this case, but Python is pretty easy.

Number two, each tool requires a function that gives its functionality. And you need some things over here, it's gotta have a name. So for example, to get the joke by ID,

we showed this before, it's called get joke by ID. Even just calling it this is enough for the LLMs to know what actually it's supposed to do, because they're pretty smart. They'll understand, oh, get joke by ID, probably gets a joke, give it an ID.

But to be on the safe side, you can give it a description, which is probably a best practice. Here I have to give it a little more elaborate description. I give it a range of numbers that are valid, which I do in case it doesn't do the third thing over here.

This is typing parameters. And with Python, you just give it annotations so that they can use reflection then to look at the actual parameters and look at any kind of restrictions. I say it should be an integer in this range over here.

And then examples, you can give examples of usage, which helps some LLMs. You can give it more things. Honestly, in my experience, these three things are enough to get it to work really well, at least if you don't have too many servers, too many tools.

So, for example, this is my simplest tool that I have for, again, making this very easy. This isn't called the API at all. It's just whatever you do, it gives you the same joke every time. So, in this case, all it does is tell the same joke.

Thanks for laughing. What's brown and sticky? A stick, exactly. Yes, precisely.

So, the nice thing about this format that MFastFCP uses is that it's all contained over here. If you use this little decorator over here, this is the name of the joke, the function name. The doc string is the description, and the function body is the function body. And here's the parameters over here. So it's all right there, right there, sitting there in your function.

Pretty easy to do, in this case especially, because it's kind of a dumb tool. And number three has already been done as well,

because we have to configure this for the tools in the server. The configuration is already there. We already have the name of the tool, the description the parameters and functionality so already done and then

number four click to the transport and go the two transports that exists one is standard IO which is just text in text out and then HTTP for running it over the web we chose a standard IO because this is going to run locally text in text out so you just say MCP run that's the default and you're done that's my my entire very, very basic one -tool server over there.

There you go. You can do this yourself. Actually, you can do this in about seven lines here. Actually, this line here is kind of optional. Honestly, you can do it in six lines. That would be a very, very simple server with one very useless tool.

Any questions so far? Yeah?

So it's basically creating these types of things, either locally with five jokes, or it can go out to the world. So when you're building it, you can have it do anything you want. You just have to write functions, the function does something, and then you return a value. So when you download somebody, it's like, hey, there's other things you can do with my server over here. And then when you download it, you go to do whatever to do that stuff over on their server. Pretty much.

This is how it looks, actually. So if you want to see the actual joke server here, I think it's right over here. Hold on. I think I stuck it over here for convenience. It's a lot of, maybe hard to see. This is my simple joke server over here.

So I have a little more stuff in it here. I imported a couple things. I used the API URL over there. There are some variables and jokes, but the tools are all pretty simple here.

There's that first tool. As simple as possible. It's just the same joke every time. And then that works, you know.

And then the next tool over here is get a random joke. It just calls the API slash random joke. It's the URL for the REST API to get a random joke. That's it. Then it returns it.

I've got a helper function that actually makes it look nicer to the LLM over here.

And then get joke by ID. Same idea. There's the annotation there. So it knows there's restrictions on it. And then there's a description. And then it gets it once again from the API.

get joked by type doctor or programming or I guess not doctor programming knock knock so on similar kinds of ideas so yeah it's just a list of tools and then at the end has the MCP run so when the

when let's see how does this work I guess when Claude starts up for example it starts all the servers up that it has access to and then when it needs one again if you outputs right tokens and then this will get intercepted by the Cloud desktop itself. It calls the tools when necessary and puts things back in. And you can see this yourself.

Testing and debugging: MCP Inspector

There's MCP Inspector tool if you're debugging, which lets you actually try tools out yourself.

If you actually start outputting your own stuff from these onto Standard I .O., they'll go into the model directly, which is kind of fun and weird to do.

So that is basically how the server works.

What’s new: where MCP is heading

I had more stuff but we're kind of running out of time here. I know because I just read these a minute ago, as the inspector mentioned a minute ago, there's also new stuff at MCP but already it's 8 .03.

Do you want like five minutes of like newest developments or do you wanna, yes? Okay well some people are nodding, some

already are in that, your trained models, some people are already, what are the three things again? The distraction, the paper, the screen, understandably.

So new New things that are happening, new developments. Tool search, programmatic tool calling, the last two we'll cover together very quickly.

Tool search to reduce context-window overhead

Tool search, someone mentioned before, I think, that the way this works is that all these MZP servers, the client will generate a big prompt saying, here's how you use it.

The problem with that is it all goes in the context window. And some servers can be kind of large. I think that Anthropic was claiming that, like, like Slack took 21 ,000 tokens, for example. So you can eat up a context window pretty fast.

So they developed this thing called the tool search tool that's going to solve this problem. Instead of having all these tools, like you might have like 77 ,000 tokens already used just by the MCP tools, you have one tool called tool search tool, which looks for tools, a tool to find tools.

So a nice idea, instead of, you know, it knows what it can find, it doesn't know what it can find, it can ask the tool search tool whether a tool exists, try that, less context. Does this work? Maybe, maybe not.

Someone tried this out, I found, in December, and they tried 4 ,000 tools. They gave 4 ,000 tools, and they tried some straightforward tasks like basically send an email, send a Slack message, schedule a meeting, and it succeeded about half the time. Half the time it failed. It couldn't find a lot of things. So some work to be done over here.

Other companies are trying similar things to make this context window problem not as big because you have to have a lot of tools sometimes, times, and tools require long prompts. That could be a problem.

Programmatic tool calling (code sandboxes for heavy lifting)

So number two problem Number two problem, people use these MCP servers a lot for agentic things. The problem can be, let's

over here, I got to have some more water, sorry, I'm going to cough again.

say you want to read like 5 ,000 rows of data from somewhere, do a lookup of some sorts of calculations. If you use a MCP server and use tools to do all these things, they read read the rows to get the data, make the chart.

There's a lot of back and forth. And you're asking LLM to process 5 ,000 rows of data and do all this stuff it doesn't want to do.

It'll get bored after a while. It'll just stop doing these things. And it's also very inefficient to use an LLM to do things like math or processing data.

It's a bad idea in general. So the way you can fix this problem ideally is there's a few different things out there.

There's my image of a sad robot because it can't do this. So there's various ways to do this.

This anthropic solution is called programmatic tool calling, where you get a sandbox where Claude can then write some code, write some Python, and the sandbox has access to tools.

Instead of Claude accessing the tools, it writes code in the sandbox. There's tools in the sandbox that can do things. It does all the processing in Python.

Then it sends it back to the LM afterwards. More efficient. This has just been started out so far. I found it's still in beta, not being used very much.

Just recently, Cloudflare released a similar thing called code mode. It's kind of similar where you have this code mode

where, again, you have access to all these tools. Python can do

all these things and then it can, you know, return something.

Even DeepL, if I read you, we've got a similar thing called programming toolkit

where it writes a bunch of code and does a bunch of stuff.

So MCP isn't always the most efficient way to do things. It's probably more efficient to have the LLM write its own code.

The problem there, of course, is you can't tell what's going on.

You lose a lot of control as the user because it has all these tools you say,

have a good time, you're done, tell me what you did. So a little scary, but open -claw is scary too. That's our world. Yeah.

Skills: a higher-level instruction layer on top of tools

And then, now there's skills. And there are skills now. Skills in MCP are just marriage made chaos. And it's great. Yeah.

I had, the day after, like I've been following MCP for a long time. the folks over at DataStax have been great at putting together. I've been a member of the Cassandra community, an open source database, for a very long time.

And the folks over at Cassandra development, with DataStax, they've been releasing these MCP servers for Astra and Cassandra, which has been awesome.

The day after Cloud announced Skills, skills, I took all the Cassandra docs, created the Cassandra skills, and used the MCP server. And it was awesome.

All of a sudden, I could instruct my integration with the MCP server, how it is that I want to structure input and queries, how it is that I expect it to respond, the format it's going to respond in, all of that becomes a translation layer through the skill.

There's also a debate of skills versus MCP, what's going to be the most useful thing eventually. Writing things in human language and Markdown, or writing these tools, we'll see where this goes. It's a fascinating world to see this.

Just a question about DeepL and MCP. Would it be correct to say that DeepL offering of MCP server is the consumer -facing layer layer for us engineers and developers

to use DeepL as opposed to, say, more production -ready applications, I would think that using the API would be a lot more stable and reliable and deterministic, like, say, for voice or translation for production -ready applications, as opposed to getting an AI agent in production and then using the MCP to do anything.

People use MCP stuff more and more. I'm not quite sure why they use it, actually. People are talking about all these things they want for it, making feature requests, and they're using it in various places. I think people find it convenient sometimes to use the MCP server I probably use the API personally if I was making an agent I'd use the API probably still because I had more control that way but people have different preferences

so why would DeepL provide MCP off work? There was demand for it. Really? Yes usage is increasing very quickly usage is very small but growing very very rapidly. It's still small but it's oddly popular

popular and sorry about skills that like in cloud code the world has changed so much like a year ago my engineer friends were telling me yeah I don't really use AI coding because I do it myself now they're like I use cloud code and all this stuff and I did a PR you know in five seconds and it's kind of weird yes yeah so if I had a SaaS company and I have a data resource and customers and like my customers want access to the data level to say and I have a rest API that I've had for years that is secure and you know rate limited and has pagination and documentation about why should i build an mcp server that taxes that data like rest can provide all that data hey maybe you shouldn't no like yeah is it the main one just so i can get right now on your local machines or to other devices or other applications yeah i think the usual use is like you know you

can use it like uh people can use it one of the reasons people like it is because they can't they They can't code. They can use servers, and they can just ask human language questions and do things like that. Another is that there's various frameworks that support MCP in various ways. You can obviously tell the language model to call this REST API, and maybe it doesn't need to.

Is it saving context window if I say it just knows what the question I'm going to ask with MCP? Because if you were writing your own thing through Langchain, you could use your own tools. You create it synthetically with used APIs. So MCP you wouldn't have to use if you were doing a lot of programming.

I think it's honestly more for people that want to use something which is like an AI client in their work. It's like agentic stuff, yeah. then hit the API, and that's what you're going to do on a regular basis. But if you're doing something ad hoc, and you really don't know how the data is structured because you don't feel like you need that, then you don't need that.

It's like folks that use our agent, for example. They can't code, usually. You can code, and you can write something which is like a workflow, an automated workflow. If you can't code at all, you can use human language. And in that kind of case, you want to have tools for your agent. MCP is a pretty standard way to make tools. But it could be non -MCP -based, definitely.

Do you know any MCP -based or like hooks or skills -based ecosystem that uses entirely everything in Rust? I want to transition everything into Rust. Truly. I love it. We'll discuss that afterwards.

We have a question in the back also, too. Just a comment. You know, with all the talk about jokes, I couldn't help but observe that, you know, The original meaning of the acronym MCP as in the master control program from Tron. That might be. I've never seen Tron, I have to admit, so I don't actually know.

One more thing, as they say. If that wasn't enough for you, one more thing MCP can do now. One final thing.

MCP Apps: bundling UI with tool-backed experiences

MCP apps is a new thing. And I couldn't get these to work for me. I tried a couple apps in my cloud. None of them actually worked. So it's obviously a new idea. but this is pretty crazy.

You know AttachGPT apps? They released apps a few months ago. I have something on this over here. I don't. AttachGPT released apps back in October or November.

I thought this is an ingenious play because it's kind of like they're like, we're like the iOS. We're like, this is our app store. This is a wonderful way to be the standard for everyone to do stuff in LLMs.

It turns out they actually use some stuff from MCPUI, which was an open source project that gives like a UI to MCPUI layer. and now Anthropic is, I guess, sorry, it's not really Anthropic anymore because MCP now belongs to the Linux Foundation.

They have MCP apps, which is a way to take MCP UI, some ideas from open apps AI, open AI apps, sorry, and then make apps with MCP. And basically, it's kind of complicated, different from what I just showed you before,

but you're bundling up HTML, TypeScript, React, and CSS, and so on. It runs this thing in an iframe, and the client throws events at it and throws them back.

So you could do things like have like interactive graphics, you could ask your user questions about something and they could respond to the questions. Some of the things, there's like a demo that has like, it plays sheet music, it generates and plays sheet music. Again, it didn't work on my computer over here, but it could in theory work.

And OpenAI also, people debate this, OpenAI's apps are kind of similar to MCP apps and actually OpenAI now supports MCP apps, is it actually the same thing under the hood? I don't quite know for sure, but no one says it's online either, but they're kind of similar now So this could become a big deal like actually running random apps right there in your favorite LM

It could be like the new like the new mini app kind of idea the new you know everything app Like we chat or something, but we will see it's fascinating development and a little harder to use and we'll see how that goes

Conclusion and resources

Anyway, if you are curious to see this deck again, there's this deck at this address over here And if you want to try our API out, there's DeepL's API. And there's our agent over there.

Great talk. Thanks a lot. Thank you.