Anatomy of an AI-first operating system

Introduction

So I'm going to try and do something very different. So I've never done this particular demo before, and I do need to get my notes. I wanted to do something a little bit different.

I was going to try and go a little bit deeper into what is possible with the technology today and what we would call an AI operating system.

So, you've seen kind of use cases that would be an AI system type use case, right? Claude and what we saw before. What I'm going to try and show you is what would be called an agent or more and more now an agentic operating system for a company and both how much it can do and how extremely simple the entire thing is.

So I would invite you to also stop me when it gets too complicated because my goal is to try and make this as simple as possible now I'm going to

Walk this through in our the our own a operating system, which we built this called rebel but the same principle is true for co -work for Open claw if you're familiar with open claw or Hermes kind of those are the four ones phones, but, yeah, Rubble is the one that we built, so it's the one I'm familiar with, and it's the one that gets me to show you what happens underneath, but it's the same principles.

You can see, generally, it's also my live environment, so we might have to strip some stuff out at some point, but on the left -hand side, you just have chats. You're all familiar with that. Basically, you're just having conversations with AI. There's nothing particular around that.

For those of you that have gone to our meet -ups for a few months in a row, you've seen me at least show this once.

From AI assistants to agents

Now, ultimately speaking, you're talking to AI in text, text comes back, and how many of you know or who knows the difference between an agent and an AI assistant?

Well, you've spoken a few times, maybe back. So the assistant answers, the agent does it for you.

Not a bad framework, yeah. It's very close, it's basically that. that.

An agent has tools to go outside of its own text box, so it doesn't just give you text back. It can go and reach into other systems and start to execute on the request that you have asked it.

I would say there's one more thing for it, and maybe it's captured in that definition, is that an assistant tends to be kind of turn by turn. You give it one thing, it comes back, you have to give it another thing, whilst an agent tends to break

it down into subtasks and continues executing on the task you give it until it decides that that it has now fully executed what you asked it to begin with, so it's a little bit of a combination.

Now, so Rebel is an agent and an agent platform or an AI operating system, so you've got that.

Under the hood of an AI operating system

You then have here the library, and this library is just folders, so this is where I was gonna go and show this.

And I haven't cleaned up my folder system.

As part of an AI operating system and an agent, you have skills. Skills actually is what Jason just showed one, which was about the ability to give you feedback on a LinkedIn post.

It's a series of steps that an AI will go through that you encapsulate into a skill, it's called, and it will, whenever you use that skill or whenever the agent uses that that skill, it goes through those steps over and over again, so it means that you can give it a procedure and it just keeps executing on it.

The same for us, so here you've got a whole bunch of skills that we built for Red Bull, the same is memories.

So memories again for an agent are nothing but text files, at least in our case. It's just a whole bunch

of text files in a different way, and I want this to come to life when I show you the actual file system.

So here, if I go into folders, you can think about an agent loop as nothing but your first message and then text files that automatically execute, that the AI agent looks at and continuously executes on.

So in our case, this is literally just a system that has a whole bunch of text files and folders. So if I go in here, I can go into memory,

memory, sources, it will go and find a file somewhere, which is a transcript of a conversation I've had. This was the Florida AI weekend that I had, what was it, four months ago now.

The core control file: agents.md and “finish lines”

The main thing here for us is agents .md. So I'll show this. Agents .md is basically the file that gets sent along with every single request that I would ask of the agent.

So just a text file. .md is called markdown. Markdown is as close as you can get to just

pure text with a little bit of formatting so that you can have titles like this, but nothing more. Now, the reason that agents use this is that it consumes less tokens.

So it's very interesting when you do the test yourself and you put a Word document into an agent, like a Word document of this size would be 100 times the actual size of the the text itself, just because it has a whole bunch of metadata, and so when you give that to an AI, it just runs out of context, on top of the fact that you're paying for all

the tokens it's trying to consume, so it's also extra expensive, so ultimately it's all text files, so here you go, this is, obviously it starts to get taken to an extreme, so our text file here, the system prompt is long.

There are a lot of instructions we give it, so you can say, here, a persona, so you are rebel, a capable, structured, diligent assistant. The goal is to execute the user's request.

The user has set the following criterion for this conversation, finish line, which is basically when you give it an actual finish line to... I'll show you here.

If I go back just to show in the conversation, when you start a conversation, you can set a finish line here, which means means that the agent will continue until it has hit that particular finish line.

So this is what's called the Ralph loop, but I'm going down in a little bit of a rabbit hole. I wanted to go into the agents, so let's go and do that again.

What else do we have? Safe mode, this is if everything doesn't work, like at least know this.

Memory as files: short-term vs long-term

Memory model, so high frequency fact, so this is where we explain to it that you have memory, and memory is nothing but files. You have short -term memory, and you have long -term memory.

Short -term memory lives in, and I think we even say that here, short -term memory for any particular space you're in lives in the spaces readme .md, which is another file.

And then long -term memory is in the memory folder. And that folder has a whole bunch of other folders.

And you just decide how to use this. Every time I talk to rebel, based on the conversation, it will decide

is there anything worth remembering from this conversation that I might need in the future. If that fact is more than 50 % likely to be useful in all future requests, it decides to store it in short -term memory.

If it is less than 50 % likely, or actually if it's between 25 and 50, it stores it as a reference in short -term memory, but with a file that lives in long -term memory, and if it's less than 25%, it only stores somewhere in long -term memory, and it's then indexed, and it can find it at a later date if it needs to.

Live demo: how an agent plans and executes

Before I go to the rest, I want to launch a conversation so that we actually can go and see this live, so I'm trying to figure out where do I go and give it here. I am at a Mindstone IAM meetup in London, and I want to demo everything you've got here.

Basically look at my emails, my slacks, my memory and everything you know about Mindstone, as well as maybe the public internet, and look at what the future of work looks like. And maybe

look at some stuff that is kind of more out there that is on the on the fringes really not not what you'd see in most kind of mainstream publications today so I'm just going to launch this and we're going to get back to this in a second but I wanted it to launch because it's going to take five minutes as I go

Spaces and plugins: shared drives and dashboards

through things now let's go back into library so in that same library we just have another view, which we would call plugins. Plugins are, in our case, so we use spaces.

Spaces can be shared drives. So we have a MindStone general space, a MindStone exec space.

I've got a Florida AI weekend space, which you all saw quickly flick on screen, which was a shared drive that I shared with everyone that came to our AI weekend in Florida. In those spaces, you then have all those folders and files.

Plugins is nothing but another folder. But if I wanted to look at... What do we have here?

Competitive analysis for Rebel. I now have an HTML file. So this lives as a file on my machine. I'll show you in a second.

But it allows me to switch it on as a part of a dashboard that I can now see that is shared between everyone that has access to the same folder. Yeah, to the same shared drive.

If I go back into the library, and I go into everything, and then folders, so I would, where would I find this? That was in work, MindStone, general, and then where do we have, plug -ins, rebel competitive analysis. analysis.

So you can see, this is nothing but three text files underneath. I didn't write them. Rebel just decided to write them.

The entire thing is compressed when you think about it. Everything that an AI operating system is, or that an agent is, is a starting point combined with go and read this file.

That file has a bunch of instructions on which other files to read and how to execute on that instruction it is so extremely simple when you think about it you've got a model and the model just follows instructions we can all write these instructions in different ways same

Operators and automations: reusable behaviors on demand or on a schedule

thing we have operators operators are nothing but text files if I go here hopefully it opens I literally just put this lab okay wait there's not there so So if I go in chief of staff, operators, investor view, operator.

Operators are, again, in the agents .md, which is the file that gets sent along every time that I talk to the agent. It says you have access to operators whenever you need them.

So whenever I execute a request that might need an investor lens, it knows I have an operator definition that lives in my operator folder. folder, go and have a view, and maybe I need an investor view on this particular question. And it draws it in, reads that particular investor view, which here, this is actually fairly new.

So who are you? I read decks for a living and read between the lines for a hobby. I've never actually read this, by the way, so it's pretty absurd. I know which slides are load -bearing, which numbers are vanity, which graphs designed to disguise a flat trend, and so on.

How do you think? This is built to try and make the AI now act as if they are an investor, right? And so again nothing, but a flat file that sits underneath

We have another thing automations automations are flat files So when I say flat file, sorry, I mean nothing but text right they are text

executed on a schedule so the only thing it does is basically it starts a conversation with a particular text file as the prompt at a particular cadence once an hour once a day once a week whatever the thing is but again just a conversation with a text file if I wanted to go look here I actually have a

LinkedIn topic finder that's I go into the instructions and it's not showing me me, so let me just... skills, automations, LinkedIn topic. You can see how there was

a bug here in the software, but I can literally just navigate in my file system and get the exact thing that I'm looking for. If I now look at open instructions, here you go. Nothing

but a text file that tells me run the LinkedIn topic research unattended and add the top Top three topic ideas into the Rebel inbox. Context, who I am, the process you go through.

Find all steps in the Find Great LinkedIn post topics, which is another file that it has access to. So you can actually click here. This goes to the wrong one.

Okay, something is wrong with the links. Let's see. Score all candidates. Select the top three for each of the top three.

Give me the breakdown. down and then what does it actually well I guess here follow all the steps so it actually does the research using beeper as well so in my case the way that I do

it is it tries and looks through my inbox and my messages to try and find things that are part of my world that I can then talk about I'm part of a whole bunch of whatsapp groups on AI so this is about finding the topic that works

Task decomposition and parallel work

for me right now let me go back to the conversation the other thing and this is where it all comes together. I gave the instruction here.

I'm in a Meetup. I want to demo everything you've got.

It broke this down into steps. So pull Joshua's recent email, pull Slack, memory, and then run fringe web research on future of work. It's doing all of this in parallel. So those four steps are running in parallel.

And then when it finishes, synthesize it into a tight narrative, render the demo artefact, and then deliver final chat response.

You can actually see here in detail what happens if I click here. It gives me a little bit more detail on what actually that particular step is.

For each of these, it has decided which model to go and use, because, again, it has a definition of which models the agent has available, and what it does, it first takes the request, takes in that massive system prompt, that agents .md file that we created. In that agents .md file, it has a paragraph which says, when you have a request, first break it down into a plan. Your plan should include which model is appropriate for which step. And then as the plan comes out, it then decides which model to go and use for each of those steps.

It will go through this, and And this is ultimately what an agent really does. This is how you go from an answer that you get within two seconds, ten seconds, a minute, a few minutes, to once you start pulling all of this together, Rebel will go on sometimes for an hour and a half on a single task.

From an AI operating system perspective, from a software engineering perspective, we now have have them running 8, 9, 10 hours. I think the other day we had one for 24 hours that was going through. Because it's text file after text file after text file going through different processes, which is what gets you to the final best result.

So here it's using all of the skill files that I will have in relation to demos. I've got I think two skills that are in relation to how do I do live demos. It will find them as it executes through this and then take that as additional instructions as it goes through what

else do we have so it has it has something on how I what my thought leadership is it has every single one of my LinkedIn posts obviously so it will probably draw on that when I said look at here my memory for Joshua's future of work point of view actually I didn't even ask it that right said look at your long -term memory it decided look at Joshua's future of work point of view it It does that.

It will find all of those and then combine it into the final result.

All of this, hopefully, I wonder if I missed anything. I think I'm going to continue. I'm going to let this execute so you all see the result.

But what I wanted to really show you is both how extremely powerful it is. This will come up with a really interesting result.

If I wanted to, I could have told it, OK, create a presentation out of it. It would have created a nice PDF at the end. end, like all of the stuff that I could have added.

But ultimately, it is nothing but a starting prompt, a file that explains how to execute on any prompt, and then a whole bunch of other files that get pulled in because the initial one says, well, if this is a text, go and look at this. Go and look at this other thing. Go and look at this other thing.

And then you continue to add to it, and you end up with a full -on operating system where, Where in our case, because everything is shared, it means that whenever I do a proposal, it automatically knows to go and look for transcripts of recent calls with the same client.

And that could be a call from me. It could be a call from anyone in the company that has had a call. Or it knows to go and always pull my email and our CRM for anything that is in relation to customer relationship management. It builds on top of each other.

And where this becomes really interesting at an enterprise level is that it compounds. Like you're now, the work at least that I'm doing, I'm spending a lot of my time constantly making those files better, because I only have to do it once, and every single time afterwards, that entire flow becomes better.

Does that make sense at all? Any questions?

Actually, we should use the app for questions. I'm going to use anyone that doesn't submit through the page. I'm not going to be able to answer the question, so I'm sorry.

By the way, if anything doesn't work on the live page, please let us know because it's the first time we're using this live. I'd love to get the feedback on how to make all of this easier.

Should you invest in agentic workflows?

Should we all be investing time into building out agentic AI to complete day -to -day personal and business tasks for us faster?

Yes. Absolutely. Absolutely.

At this point, I would not be able to do anywhere near my job without these types of operating systems.

To give you an idea, I actually went through an exercise to try and quantify how much time this is helping me win.

Software engineering always kind of lives the future a little bit. They're about a year or two ahead of everyone else from that perspective.

At a software engineering level, my output is about 20 times what it it was three years ago. 20 times. At a knowledge work level, we're closer to 10 times or so.

But that is absolutely crazy. There is no way that I would be able to do the work that I'm doing now.

Practical productivity gains: drafting emails and scaling output

I have got automations that, again, the system is really simple, but that just once a day, look at all of my start emails. It then looks at, from those start emails,

and everything I know about Josh, all of the previous emails that he's had, how he's answered in the past and how he thinks about work and MindStone. How should I write a reply?

Draft a reply. Saves it as a draft reply.

I wake up in the morning, every single one of my start emails has a draft reply. 60 % of them, the only thing I have to do is hit send because it knows.

It has all my transcripts. It knows how I write emails. It knows how I do proposals. It knows the situation of the client at that particular point because it has a transcript

of everybody else in the company and our CRM and everything else, and it's just able to draw on that.

Someone wants more about the LinkedIn posts. We will get to that in a sec.

If that post has scored 74, why didn't it perform better on LinkedIn? That was the question.

What the future looks like for software engineers

What does the future of software engineers look like, according to Josh? Well, definitely definitely not coding. If you're still manually coding at this point, don't. Genuinely.

If you're still looking at code, actually, that's kind of, yeah, it's gone at this point. It's only a question of time for those that are still holding out.

The entirety of Rebel, close to a million and a half lines of code, not a single one coded by hand. Not a single one to give you an idea.

Normally, that would have been a team of 50 for a few years. How

Trust, evaluation, and hallucinations

do you autonomously evaluate the output of your agentic AI to make sure it doesn't derail from expected goals or hallucinate?

Really good question, and I would say not one that is fully answered, just like you have the same problem with people.

We keep talking talking often about how actually conversation and using AI is very similar to having a team of people.

People hallucinate. People make stuff up. People make mistakes.

If you have a team of people, I can guarantee you if you've been a manager for a while, people mess up. And they do things that don't always work. That doesn't mean you stop hiring people.

Just like the same way that if AI sometimes hallucinates and sometimes gets it wrong, doesn't mean you stop using it. It means you work on the system. It means you try and get it to be better, and it means you accept that sometimes it's going to slip up.

If you're trying to get to a deterministic output by definition, you can't work with a large language model just because it's non -deterministic, but it also would mean you can't work with people because people are not deterministic. They will screw up.

So that's a big one there, and then a last one before we get to wrapping up.

Why Rebel vs other agent frameworks?

Did you use co -work before, and why do you like Rebel more?

Very good question to finish on. Yes.

Actually, so for those of you that will have come to this community for a while, I had a much more technical version of this stuff running like a year ago.

Actually, part of me feels really bad because it was way better than Open Claw before Open Claw was like, shit, I should have released the hell out of this thing. And we were just too late.

Enterprise compounding: shared memory and safety mechanisms

We're trying to make this accessible accessible for non -technical people, but the main thing really that we're trying to look at is one, make it extremely easy to use and two, really build for the enterprise use case,

the real work use case, so the biggest thing that I find value in is actually shared memories, is this idea that you have, everyone in the company has their own rebel, their own AI operating system, their own agent, whatever they do, those agents decide what of what they're doing on a daily basis is worth contributing back to the main company drive, and then everyone

has access to that in real time. We've got salespeople who can scale up within weeks on a B2B sales cycle that normally is multiple months. Why?

Because instantly they can just use the operating system and it just has everything they need, and it has the entire situation about the client, about wherever they are at, just at their fingertips. It's things

things like that, it's the shared memory that really compounds. There are a few other things

like safety mechanisms and stuff like that which, to the question of how can you trust your AI, well, on the other end of Open CLAW, Open CLAW has nice bits where it tries to escape its own environment and do some weird stuff sometimes. That doesn't really happen

with Rebel simply because we built in some safety mechanisms. It does do some stuff sometimes, sometimes, don't get me wrong, it gets stuff wrong, but it doesn't try to break out of its own container and it does try to escalate whenever it needs to get permission from a

user kind of thing and we did more work on that which is why I end up preferring it. Having said that, co -work is absolutely awesome, they're doing really crazy stuff, on some stuff they'll blow us out of the water, that's not a question.

Conclusion

However, what I wanted to show today was just how easy all this stuff is. 90 % of what I showed you, you can just give to co -work because it's just files on your machine nothing else you pointed

at co -work it can execute the same thing