Building Guided AI Workflows: Designing with AI as a Co-Pilot

Introduction

Hi, I wanted to talk about MCP servers.

I'm Alex, and I've built a bunch of MCP servers.

And before I get started, though, I'm curious,

who here has connected their ChatGPT,

Claude, let's say, Claude Desktop, Claude Code,

to a third -party tool using either a connector or MCP server?

Anybody?

Okay, quite a few.

Has anybody actually built an MCP server?

Okay, a couple.

Background and Perspective

Well, I mean, this talk will be relevant for you, but I think it'll be relevant

for everybody, because my background is as an institutional investor.

So I recently joined a

firm called Ballyasney Asset Management, and I lead our internal build out of our software.

So

we build different tools for our investment team to help make them, basically help them with better

better informed investment decisions.

Before that, I worked for a technology company called

Ravenpack, where I was really working on AI tooling with many of the largest banks,

institutional investors.

And then before that, for the past decade, I've spent most of my career

actually as an investor.

From Finance to AI Tooling

And I think what's amazing nowadays is I didn't come from a technical

background.

I don't have an engineering degree.

But so long as you are sufficiently curious and

and persistent, you can climb the curve so much faster

than you ever were before.

Climbing the Technical Curve

I tried to kind of make a pivot into technology

maybe back in 2017 and was not successful with it.

And then, I mean, lately it's just been phenomenal

the types of things that you're building

or that you can build.

And so what I want to talk about now is

Why MCP Matters Now

we're at a really, really interesting point in history.

And it's interesting to think that language models

have been out now for three years.

I think ChatGPT launched November 2022.

But if we think about the basis of the internet that we are on today,

I mean, it's really the hypertext transfer protocol,

which was the groundwork for that.

And that actually took 10 years of iteration

and developing of that framework

for how do we pass information between us

before we really had a wide adoption of the internet that we know today.

And I think what's fascinating

A New Protocol Moment

is we're at a very similar moment in history right now.

Although ChatGPT and language models have now been out for three years, the introduction of MCP, which is model context protocol, we're only one year into that journey, and already in the past just one year you've seen a tremendous development in the protocol of how these systems interact with each other.

What Is the Model Context Protocol (MCP)?

So if you're not familiar what model context protocol is, I'll explain it a little further, but in short, it is how do we get agents, language models to interact with third -party tools, so that could be APIs, it could be interacting with the user in a structured way, or other resources that we might have.

And this is the basis for the Lennox level of intelligence that I think we're seeing in the world today

if you look at adoption of

Interoperable or agent frameworks MCP has gone absolutely parabolic in terms of the stars that you're seeing on github

Also, same thing with weekly downloads

So the big question is what is it and I kind of talked a little bit about that at a high level

but oftentimes when people you know are using and I often get you know, kind of

looks like this, where people kind of know what it is, but don't really know what it is.

So let's

get into it.

How MCP Works

Clients, Servers, and Data Providers

So in short, within the MCP protocol, you have effectively your agents or your servers,

which could be your chat GPT, could be Claude, could be Microsoft Copilot.

There's a whole range

of MCP clients.

Sorry, those are clients.

And then what you have on the other side of that is

is providers of data, resources, other things, so these are data providers, APIs, and essentially

in between you can build these things called MCP servers which will take the data from

MCP Servers as Translators

whatever API endpoint you could have, maybe it's Google Flights, maybe it's Airbnb, maybe

it's something like that, organize it and feed it to the language model in a way that

the language model can call it and get it back to them.

Live Demo: Generating an Investment Report with MCP

Now in terms of what this looks like, if you've ever used either ChatGPT or Claude, we can

can just take a look, and so I'm going to flip over,

because this demo, so what I'm going to do

is we're going to use an MCP server to produce

an investment grade report.

And in doing so, it takes about eight minutes to run,

so I'm going to get started on this early.

So if I come here, hit plus, there's

various different connectors.

You can connect it to Google Drive.

I've actually built another one for Google Sheets,

which is very interesting, but I'm not

going to have time to show it.

But this is one that I built for the company I previously

worked for.

Now, nothing here is anything that we

do within Belyazny.

None of this is investment advice from the AI or me.

But this is something

that I've done personally prior to joining and something that I was going to run through.

So there's different sort of prompts I've built in here that are sort of pre -can things,

reports that we could generate for any company we like.

Now, to keep it simple, and because I

Choosing the Company

know the timeframe of generating it, I'm going to go with an earnings preview.

And in the next

So for the next three days, I know the companies reporting earnings, at least the large U .S.

companies, include Micron, which is a large, let's say, memory service provider, Carnival Cruises, and Nike.

Can I just do a quick poll and see which one we would like to run?

So Micron, Nike, Carnival Cruises.

All right, we're going to go with Nike.

So I'm going to put NKE into the prompt.

Using Stored Prompts and Skills

And this is a prompt that I've built out in the previously and is stored in my MCP server

So I'll get into the how that works and so you can see here

There's like a longer prompt that's just being fed and then what I'm also going to do is I'm going to give it a little

more instruction, so I think I had a little

additional

Instruction on what I'd like it to do

Which is build a detailed report using the big data tools

tools, that is the server that I'm using, and then render the final report in a PDF

using the equity report builder skill.

Skills are really cool.

They only launched about a few, maybe two months ago within the Claude framework, and

it's effectively like MCP but only specific to Claude and has some advantages.

Great.

I'm going to hit run, and this is going to get started, and we're going to come back

to this in a little bit and see what it produces at the end of this.

Inside the Protocol: Core Components

In the meantime, let's talk about the protocol itself.

So the core parts of the MCP protocol consist of tools, resources, and prompts.

Prompts, Tools, and Resources

Now prompts are basically what I just showed.

You can have pre -existing prompts that you store within the server that whenever any

user interacts with your MCP server, they can just click that and run it without needing

to necessarily type it out.

Tools are how do we interact with third -party APIs and gather data and return that to the

agent in a way that's useful to the agent that can produce something worthwhile.

And then resources is effectively a remote file system where you can store other content

that might be useful for the agent to be able to crawl through and figure out.

So this could be instruction manuals, it could be guides, it could be other things that you

might want to have sitting on the back of your server that then your agent could interact

interact with, read, understand, and then do something a little bit more complex.

Early Servers I Built

Now in terms of the servers I built, the first one I built was just on resources.

So what that did was basically it went to our API documentation, took all the documentation,

and then helped coding agents understand context so they could, you know, essentially write

calls to our APIs more efficiently.

The second was just tool calling.

Most MCPs you see are primarily using tool calling.

And then the third basically includes all four of these things, which is then elicitations.

New Additions: Elicitations and Sampling

Now, elicitations is a very new part of the MCP framework.

It was only added in June this year.

And what it allows you to do, so the two additions were basically elicitations and sampling.

And so elicitations allows you to structure one of these servers that it will trigger to ask questions in a structured way back to the user.

And what that enables you to do is to build this sort of very human in the loop feedback

process where if, let's say, it wants to do a check, it will check with the user first,

and then the user can say yes, no, okay, I like that, I don't like that.

And if we have time, I'll show.

Designing Effective MCP Servers

So if you're building an MCP server, which at the moment is just one of us, but I hope

is more and more, because honestly, it's a lot easier than it sounds.

And actually, there are skills within Claw that allow you to do this in just a couple

clicks that you can spin up a server yourself.

But when thinking about how to spin up a server, there's some big considerations.

Pitfall: Too Many Single-Responsibility Tools

And so the first, and I think the most naive way that I often see people building these

things is single responsibility, which is, okay, for us, we have 30 endpoints, right,

30 API endpoints, and so we just build one tool on top of each endpoint, and that's easy,

right?

right?

It's actually, it's really not that simple.

The reason for that is because when you present

an agent with many tools, what happens is the performance degrades logarithmically.

And particularly after you get over 40 or 50 tools that your agent has access to,

it really struggles to pick the right tool.

And you see no improvement with iterations.

So if

an agent tries again and again, it'll still get it wrong.

And then also that you're seeing some

some systems begin to limit the number of tools that an agent might have access to,

to kind of offset or prevent from this particular issue.

The second approach, which is very novel, we've just seen this approach in the past

Layered MCP Approach

maybe two or three months, is what's called the layered MCP approach.

And effectively what this is, so this is a company called Block, and what they did is

they built a MCP server that only had three tool calls.

So the first tool call, basically the agent would be able to look over and read the documentation

documentation of the API for the block API documentation.

The second one is it would decide on the right tool

and then configure the parameters

for the particular API call.

And then the third, it would go call that.

And what this enabled them to do was basically

have over 200 endpoints that a single MCP server

with three tools could figure out how to call.

And I think that's a really, really interesting thing.

There's a couple of really good articles out there about it.

One is on Cloudflare.

If you are interested in this topic,

I'd say check out the blog post called Code Mode

on the Cloudflare's blog.

But the problem with that is essentially

for any call that the agent wants to make to the server,

you basically got to do three back and forth

to get to what you want.

And you're also having the agent then write code

that is kind of being thrown away in the moment.

And so it just is, it consumes a lot more tokens

tokens when you're building a layered MCP server as opposed to a single responsibility.

Striking a Practical Balance

And I think having built many of these things, what I find is it's important to strike a

balance between these two ends of the spectrum in terms of how you think about your MCP structure.

And I think one of the things that I found works best for us and what we build is instead

of as developers you often think about, okay, we have these endpoints, now we've got to

build MCP tools.

tools, but what I found works much better is actually think about the user experiences

that you want to have and build down to that, right?

Design for User Experiences, Not Endpoints

So instead of having a single API call that calls a single tool, you might have, okay,

for me, I want to get a comp sheet, you know, and that might require us to get the market

cap and the enterprise value and the EBITDA for five or six different companies.

That might require a lot of back and forth API calls for us to make.

But that is still a single user story, it's a single experience, so we would wrap that

into a single tool.

And you see this with how GitHub has structured their MCP server, which they have a single

tool that will create a branch, commit to the branch, and then push to that branch all

in a single tool.

So it's wrapped up an entire user experience that GitHub might want to have into one interaction

with the MCP.

be.

Demo Results and Trace Walkthrough

Now, let's check out how we did with our demo.

OK, so we're looking at a PDF of an earnings preview

that we've just generated in live time.

This PDF is, let's just download it

so it's a little easier to see.

And before we get to the actual report itself,

it's good to go back and check the trace of what actually

happened in this process from when we gave it the prompt

that we wanted to run.

And then what did the agent actually go do?

do.

So first, it understood the specs of the prompt.

Then, let's see, can everyone see that?

Should I zoom in a bit?

Yeah, great.

So then it ran.

So essentially, what I did here is this particular server

that I'm running this with, the one that I've built for it,

Three-Tool Architecture

consists of three tools.

And so as opposed to the whole thing I said like 30 endpoints.

So the three tools are effectively find companies.

So whenever I mention a company, it

It will go look into our knowledge graph

that we have in the big data server.

It'll find the ID of that company.

From there, we can use that ID

to then fetch other pieces of information.

So the first tool is just checking the knowledge graph

for the particular company.

The second one is then getting structured data, right?

Oftentimes in these reports, we want the stock price,

we want the market cap,

we want the recent balance sheet and income statement,

all the different little bits of information

that go into a relevant report that keep it grounded

it and keep the agent from hallucinating.

So the second is basically getting structured data.

And then the third is running search.

And so a core part of what we provided at Ravenpack was a search API for finance.

So we had taken hundreds of millions of documents across news and filings and transcripts, and

we had organized that into a vector store that we allowed institutional investors to

basically query and search for.

So then what you can do with search is you can have an agent that can say,

okay, I'm looking for the latest developments for Nike.

I'm looking for the latest developments for these things.

And so just using those three tools that were each built on many different endpoints,

we were able to put this together in this way.

So the first, we can see that if we check it out, okay, it found,

I found Nike, and the ID here is D64C6D,

which is the ID that they have on the back end of their system.

system.

Then once it found it, it decided, okay, I'm going to go get this data, right?

So it went

and got the current price, the market cap, all this other data that we might want to feed into

the report.

And if we jump to Google, just to check that this is right, let's just go Nike stock

price, 6686, 6678.

I mean, we're pretty much bang on, you know, maybe 10 cents off.

And in terms of

the range, we've got that as well, right?

And we know these are right because it's calling an API

as opposed to just generating it from within the model.

Then once it found that, it then went off

and did a number of steps here.

So it's put them all into the single, oh, this is just the

generating the report.

There should be some searches that it did.

Oh, eight steps are all

mixed in here as well.

Great.

Yeah, okay.

So get company tariff sheet.

This is the data call that

it did, then it ran a search, so let's see the search that it ran, okay, Nike earnings

guidance outlook CE turnaround strategy initiatives.

So clearly here there's something about a CEO turnaround that it found from reading

the documents, and then it went and got other pieces of information from the transcript,

from the filings, from other pieces of relevant information in our repository.

It then did another search, did another search, so it did a series of five or six different

different searches for different pieces of information, and these are all different chunks

that our search API returned back to the agent.

Once it got all that information, it organized it into a report and then used a final skill

to then build out the PDF.

What the Report Shows

So here we have an exact summary of what's going on ahead of earnings.

It's saying the turnaround under Hill is showing early signs of progress, right?

So I guess this is the new CEO.

Giving us detailed metrics for how it's performed in China.

Revenue estimates.

then it goes through forward estimates of what are the expectations that analysts are currently

modeling for the next few quarters then it gives you prior quarter performance of okay this is how

it performed versus expectations and what we saw in terms of the growth rate and then if we keep

going it'll give us latest developments so we got leadership restructuring they've got a new I guess

win now strategy there is big questions right now about the impact on tariffs on the company

And then key metrics to watch, which it talks about either the wholesale momentum, their margins, inventory levels, Q3 guidance.

And so this is basically touching upon a lot of the key issues here that a sell -side analyst or a buy -side analyst might want to look into.

And from my experience, I wouldn't say this is exactly at the level of a professional investor yet.

But this is also something that we built six months ago, and the tech has come a long way since.

Elicitations in Practice

So, with that, I can flip over and also show just a quick thing on how elicitations work,

which let's just do this.

So elicitations are exactly the same thing.

So this is just a different client, right?

So this is just cursor that I'm using instead of Claude.

And if I say find me the company ID for Nike.

Client Support and Flow Control

The reason I'm doing it in a different client is the new feature of elicitations within

the MCP protocol is only supported by some clients.

So Claude, ChatGPT, other largely available clients don't yet support the elicitations

feature, but GitHub Copilot, Cursor, and many others do.

And what that essentially does is so the same query that we ran here, the same tool call,

all.

What it'll do first is, great, I found Nike, this company in the consumer goods sector.

Here's

the ID.

Is this the right one?

And I can say, no, that's not the right one.

I actually meant

something different.

And then it'll say, okay, did you mean any of these other companies?

So here

it's talking about this NKE, which I guess from NKE, I could have typed that.

It's talking about

Anki Austria, and the industrials for Sir Bearings, etc.

So, right, so this is a simple demo,

Human-in-the-Loop Decision Trees

but the idea being that I've architected this

in a way that, okay, first check with the user

if this is the right piece of information before proceeding.

Once you proceed, okay, if they say no,

so you're kind of building a decision tree here.

If they say no, then give them some more information.

If they say yes, give them this information.

And you can keep building these layers upon layers upon layers,

and then when you put that in front of a user,

user, then you can build something where it's not just like the AI went off and built something

for you, it's that you built it in a very guided way with the AI as kind of a co -pilot

with you.

So now if I say, okay, yes, that's it, so great, then it continues.

Great.

Conclusion and Q&A

And that is the basis for the demo.

And yeah, with that, if you want to connect or chat about the topic, happy to take questions.

Finished reading?