AI Memory for multiple model uses

Introduction

My talk will be kind of very basic. It will be tied to running.

Quick audience check: how people are using Claude

Now, how many people here in the room are Claude users? Put your hand up.

Everybody, that's your main thing. Most people, that's what you're guys going for.

How many of you are using ClaudeCowork? Put your hand up.

What about ClaudeCode? Okay, okay.

Why choose Claude Code vs the desktop/chat experience?

Now, if If I ask the room, why did you pick Claude code and stick with it versus, let's say, just the desktop app on one of the surfaces?

The data is able to be locally stored? OK. So you're using an app or?

I don't touch terminal. OK. OK. So this might be interesting.

What about anybody else? Who's a terminal user here? OK.

Now, are these mostly business functions or technical sort of things? Business. Business mostly. OK.

Motivation: reduce costs and spend tokens on thinking

From my sort of take, I've been following what's going on. And I keep trying to buy into the Cloud Code hype. I don't want to say the Cloud Code hype, because I'm sure it's very useful.

But what I'm going to propose to you guys today is actually a different take. you can actually do all because I'm sure everybody who hates their usage limits so this this might be guys for you what I'm proposing is my setup personally is actually from chat and

what I've actually found and seeing what chat does and what it's been able to do chat actually actually runs about 95 % of the functions that you needed to. My cloud code usage, I started shipping with actually switching it because, one, the token costs.

I thought I needed a new subscription, but I just didn't want to pay it because it's too much. So in the meantime, Codex has come out, but I also, probably one of the few users.

Anyone here use anti -gravity at all? Try it, the new version?

A local-first setup: skills, memory, and a lightweight Claude.md

two guys okay I actually have a setup for for for this so my approach to this is pretty similar to what most people have here right you're not seeing anything crazy here if you look at the actual

memory and the execution for local just for myself I run a local file system it's with Claude Claude Claude has a system that is fully local. It's from chat.

You can access all this stuff. The local system is built all the Markdown files, memory, skills, everything. Most of you are probably used to putting that into Claude code into your Claude .md file.

Am I correct here? Anybody have a different setup?

Mine actually runs from skills. So this is a completely maybe different setup. Maybe it's the first time.

Pointer-style bootstrapping with a skills file

The idea is and this is where things are moving towards is this is a system which is moving towards a pointer type system and

my Claude MD file just functions as a One sentence that says go look at the skill File that I use as a boot file To create all these things so that runs in my file system where I do most of my work, which is in chat

Separating tool environments to avoid messy writes

anti -gravity is my substitute for code the two file systems can read each other's memory and check all the local file systems if they need to check anything for skills or whatever it will be but they can't write to each other and there's a very there's a very big wall in between so they

can't actually uh because i tried it before when i was building it i had an idea that they'd write to each other's but then it started becoming really messy so i dropped it and on top of that

The team-sharing problem and moving selected pieces to the cloud

because other people once you have other things that you need to share with your team because the main limitation of the setup that you have today most people have it local but they find a problem sharing it with other people right so one of the big problems is you need to take it to the cloud in some way and I

enabled this by a agentic platform called dust which I've been using for a while but essentially it does the execution if a team needs to build an agent or anything like that it's done through that so the idea here is just to

give you the idea the main takeaway here is spend your tokens on thought not actually on memory or you know hey Claude do this or you will go to jail remember the old sort of joke or whatever will be nothing like that so

How the workflow runs day to day (chat + local skills via MCP)

let me show you how it actually works this is just my chat a lot of you guys might see this so this is in anything that I can run basically what you see is is the way that the system works.

It accesses the library, as you can see here. It accesses it through MCP, which to my surprise, I tried to move away, which you'll find is actually with using MCP, you'll actually find all of the skills

that I have built are locally accessible on my desktop. And I use the skill as a pointer to go to my, if you have anything to look for,

memory skills or anything is accessible on my hard drive and it's got a series of steps so we'll look at the memory the anti -gravity any cache gates anything that it has we'll run through the chats I'm not gonna go through all of them very deeply but you'll see that there might be open items we'll have all the

stuff that we have here and it will have stuff that will be basically here that is accessible so anything I do here now for example is at any point does anyone Anyone have any questions so far? Yep.

Model selection: Claude for planning, faster/cheaper tools for execution

There's a couple reasons for this. One is definitely the token cost. Yeah, that's the main thing, because anti -gravity,

the main thing is, if any, because Google is worse, Google is actually, as far as one of the things, I will give it to the models.

Opus is very good, or whatever the latest model that they're gonna shut down next week because of Donald Trump, whatever. whatever.

Google is a great model for execute because it's fast, but based on my testing of Flash 3 .5, 3 .1, anything before that, Gemini tends to have hallucination problems, which Claude does not.

So this is kind of the principle that has been coming up more and more, that find find a tool for yourself that will allow you to get the job done. Some AI models are better than others for certain things.

For planning and doing things of the nature of higher level, I would say, which aren't repeatable, you want to go with something like Claude. That's what I would recommend, and that's what's been my finding. So that's why I set it up that way.

Benchmarking: chat projects vs cowork/code usage

From my findings, I've ran the same example, two exercises of the same length, and I've set up a project. Same thing. And then I had the project set up locally using Claude Cowork versus Claude Chat.

And the chat project actually used, in the same session, doing the same amount of work, 16 % of my usage versus Claude co -work using 60.

Yeah, the surface chatless anti -gravity. So a lot of the things actually chat was able to do, the reason

Why “anti-gravity” for execution: speed and agentic capabilities

since May an anti -gravity 2 .0 came out, I implemented it just because it's way faster. Like it's, if you look at Claude, it has like 71 tokens per second uh anti -gravity processes and the new flash model is 289 per second so it's absolutely way faster

right so here i kind of just gave an example right so you might have certain things that uh you see the equivalent the two systems that i built for anti -gravity which i'll fire up in a sec you can essentially do the hard work in anti -gravity so that's that's kind of the clod code equivalent if you want to look or the co -work equivalent

so for example here you can see that there's a couple things it's it's got some things related to the the file system it won't see all the things but the system is designed in such a way that it's able to read Gemini files and vice versa right so Gemini also has a

really interesting feature it has two really interesting things because most of people when you're using Claude Claude code Gemini has a really cool feature that you can just set the goal it will run from the goal and it will keep working spin up as many agents whatever that you need the agentic

possibilities are just much bigger with anti -gravity then they are with Claude code. And it's a lot cheaper, so that's always a bonus, right?

Demo walkthrough: boot file, goals, and running a task

So here, for example, I have a boot file, so I'll run it from here. This is the equivalent of what I just ran.

It's a skill. And what you'll see it working through, it's actually reading all the files. So you see it's reading CloudMD, right?

So it will check because all of the actual coordination and the deep thinking work and the strategizing is done through Claude. He'll look at all the active files.

Also another thing is Python uses or it uses a lot more code the way the nature of the actual things that are available. I'll go through. So

it's working through the goals. It's got the right boundary. It's got everything that that's here might be able to do. And here that's been booted up.

Goals. So I'm gonna try an exercise. It's a bit long. It might not work.

I will say this hundred 100%, but it's more to show you the demo of what's possible. I have more, I don't want to just share data that I have with certain things.

Can you find for me at least five people at the Mindstone? I could just do this. Came for June, that register from Meetup and tell me There. An interesting fact.

So that being said, the nice thing about it is it runs. The boot will bring up all the context that's necessary to understand what it might be. And it will look through all the files that it has that are associated with tasks that might be assigned.

And you can see it's working a lot faster than CloudWood in this case. So it may look at all the context because I have memory locally, of course, accessible.

while that's doing I'll discuss and I'll show you guys the results but I also wanted to show

Scaling to a team: shared agents and workflows with Dust

you guys one important thing just related to the third part of it which is the setup so if I have dust here this is sort of where people clod I also have the ability it will build agents that you can just because what's cool is with dust you're able to build agents from skills so if if you have a skill file you can use it to upload and if you look here for example I might have where you can build them it's very simple to do but the

agent itself just one moment oh it's the workspace okay sorry one moment here I'll have a full agent that people can build and if there's other stuff and this information also gets gathered so it can be captured written in MD files to your hard drive so anything that people do there you're able to do the capture on that as well so so there's things in there that are associated to tools memory that are out of the box and I guess it's down but

essentially what will happen is you'd have the ability to have an agent that anybody can build on a platform that's shared it's a workspace that anybody can do and all of that can be built and they can build for different workflows that you're doing related to the job so it could be a developing agent whatever it might be based on best practices with tools that are online to everybody that's able to use it. And that way you complete the whole group working together as a team.

But you can also work locally in an efficient and budget friendly manner, where you're going to get actual output from results. I've ran this on go to market campaigns for myself and I've been able to to get response rates going from 3 % to 10%. So it is meaningful in the sense of what you're able to do with it. And that's kind of basically it.

Let's see if anti -gravity came up with anything here. It's still working, I guess, through it. But it's having a look. As you can see, it's just gone to the site from meetup .com. And it's able to pull people's information. So that's kind of it from my end. And I'm happy to answer any questions.

Tesery. Yeah, they do. They do. They are. The difference is anti -gravity is specifically an agentic platform. So you can use, they do have the option that you can jump into an IDE, but I hate IDEs, so I work through this. Anybody else?

Where this helps: sales, marketing, and operations automation

You could use this, for example, in not only sales, but let's say if you were to create ads for people you could build based on what you have on your ICP you could build let's say for example marketing ads

because Gemini for example it has access anti -gravity also has access to things that are video world -class videos or ads that they can write and copy and it can show you basically a lot of things related to that so marketing would be

another one if you're doing operations stuff where you need to let's say do things related to do invoicing for example you can process a lot of stuff through invoicing that can make it a lot simpler just because of the mass amount of data that anti -gravity can actually it has a two million token window versus a one million token window for any cloud model sir yeah so that was one of

Keeping token budgets under control with selective loading

the things one of the reasons why i went through this is you have two options you have cloud md if you use code or co -work that could be you know people don't keep track of that and they automate the process quite often so people's clod files can get really big it can eat up just as many so you have to really look at you know it can be up to 10 000 tokens you have to

realize that in some way but i also don't load all the skills at once right there are options that there will be an original load but that's why i keep it in such a way that i use the bootstrap file as i call it or the boot file to run it whenever i think it will be necessary and i will be able to read off of that so it's very isolated yeah depending on what you're doing

Conclusion: chat is underrated, measure usage, and protect your budget

but you still have full control over the token budget that's really uh all it for me i hope you guys enjoyed it gave you some inspiration in terms of trying new things a lot of people the main takeaway is chat is very underrated.

A lot of people jump into the stuff. Really, I started being suspicious because when I did that test where I did a same task in code or in chat and

co -work as services in cloud, because the original intention was to use the different services and improve with each one, I found that it was shocking to me that it was 60 % versus same model, same reasoning level 16 % versus 60 % saying worth noting and might help you

save your token budgets so your company doesn't do what uber did and burn through all their tokens by month four of the year that's it thank you very much