So normally, when I speak at conferences, all the slides, presentations are so high quality that when I come in, it's really difficult. But today, I think it's going to be quite easy.
Thank you. Appreciate it.
So how many of you are building an AI company? One, two, three? OK.
And how many of you are coding? One, two, three. OK.
So we are not going to go too much into the technical details. I'm the CTO, so we can go as much detail as you want. But maybe we keep it at a reasonable level.
Because we are a small group, very cozy, you can ask me questions if you want.
If I say something that doesn't make any sense or is not understandable, feel free to say, OK, can you clarify that?
Because I work with these all day long, every day, for the last two years. So for me, some things are very obvious.
So maybe they need to have more clear explanation. So I will try to do at that level.
So Sensei. Sensei is a company that I've been working in
Mid-July is going to be two years that I'm working on this. I'm going to tell you how we started, because everything else will make more sense.
So my co-founder, Dan, he wrote two books about digital immortality, as in proper consciousness uploads, living forever, and stuff like that. The point is that he's an optimistic nihilist. So he thinks that the life of a human being doesn't matter in the big scheme of things. But he's optimistic because he just enjoys every single day of his life.
So it kind of balances out, right? He said, the only way I could kind of break this cycle is if I invent a way to live forever, because then you are changing the rules of the game, right?
So he said, Marco, come with me. We're going to get a flat in Madrid for a month and a half.
In August, July, August, so it's too hot to go out and enjoy Madrid. We're going to stay home in the AC and work, right? So that's what we did, like 16 hours per day, seven days per week.
We came up with the proof concept of this. But I told him, look. no one knows how human consciousness works no one knows how the brain works it's a very complicated thing to try to emulate so you can kind of take your brain and put it somewhere else i said we probably 30, 40 years to that and a trillion, and we are going to probably fail.
So why don't we start with something more manageable? That's the faraway goal. Let's create some steps to it.
And so we said, well, we can work with patients with dementia. They know that they're going to forget things. So instead of taking the consciousness, we just take the memories, the knowledge, the ideas, the views, the voice, the image. you create like a simulation.
It's not the real person, but it's something that looks and sound and thinks like the real person. So that's what we call the replicas, right? So that's what we started building.
And I can tell you that selling to patients with cognitive decline is very hard. Making something that they can actually use is very hard.
And then we try to sell through retirement homes, very difficult. So now we realize that what we built is basically a technology that allows you to create your own second brains with your own knowledge and apply that to different use cases. So that's what we are building now.
I'm not selling you anything because we are kind of not agreeing on what we are selling to whom. So don't worry about that.
I'm just presenting all my learnings that I did in the last two years building this. I've been working in technology since 16. I sold my first software at 16. So I have been working a long time on tech.
So when you build a product in AI, you have LLMs, right? ChargePT, Cloud, whatever.
And back in the day, two years ago in this field.
Everybody was just a wraparound chat GPT. They would just create a prompt, and make a website, and OK, I have a company. That was the hype back in the day.
But the reality is that to build valuable and reliable products with LLMs is more than just calling charge GPT.
So today we're going to see a bit of a demo of what we're doing, some key learnings from Billy Sensei.
Because it's such a quickly evolving world, what you build now you need is going to be obsolete very quickly. So how do you prevent that happening to you?
And a bit of our architecture, but maybe we skip that if it's too difficult and technical.
So what I want to say is, at the beginning, everybody started with wrapping ChaiGPT. Then some people progressed.
I would say, let's say, half of the AI companies do something called RAG search. Who ever heard of RAG?
So basically, LLM knows things, because I know OpenAI decided it needs to scan the internet and learn those things, which is fair enough. It's a lot of knowledge there. But it could make up things.
Then if you want to do, let's say, the top 25% of the industry, you need to kind of, when a user asks you something, let's say you have a bot that tells you about restaurants in the world, right? And you load information about restaurants in Marrakesh and restaurants in London.
And a user comes in with a query, what are the similarities between the I don't know, fusion cuisine between Marrakech and London, right? So if you take that big query and you try to search into your systems, it's going to be very hard, right?
So you need to start breaking down the queries in the constituents. So the topic of this conversation is restaurants in London, restaurants in Marrakech, fusion cuisine. So you need to break it down.
And also when you process data, for example, if you load all the data of... restaurants in London or restaurants in Marrakesh, you need to process the data so it's easy search. So you need to think how people are going to look into your stuff and how you can then find it.
So I would say probably 25% of the companies do this nowadays. So we are kind of going to the top.
And I would say that the next level the top 10 of the ai companies what they do is something called semantic compression so dlms you know if you heard about tokens context there's so much things you can ask in one go right so how do you compress less detailed information and just get the kind of the what really matters of a lot of stuff so the you know you're not overwhelmed the lm so that's called semantic compression
Self-awareness not as in thinking, but what do I know as an LLM?
We're going to look at the demo later.
And I saw things like knowledge graphs. So knowledge graphs is a way to create a connection between nodes, concepts.
So for example, I get five messages. So a few concepts could be the people in the messages. So you link a message to who are the people.
Another concept could be time. When was this message generated? So you can do that across all the messages, for example, or whatever topics or locations.
And then you can navigate this graph. So if I want to see all the messages related to Marco, I can go to the node Marko and then find all the connections.
So here we are talking about the top of the industry. So you can see, one day it was just such a GPT wrapper, and now we are, it's like when they invented FHIR or whatever, discovered FHIR. What are we going to do with this?
And then with the time, the industry evolves and more tools, more features, more ideas come together. And now in two years, a lot changed. And a lot is going to change in the next two years.
So we are at the beginning of this.
Good. Demo time.
So I'm not sure how we're going to do this, because this setup is not ideal. But let me try to hit this super small. Maybe this way?
Yeah. So. If we are lucky.
Yeah, so I'm going to just type it. So what is Sensei?
So this is my co-founder, Dan. Be patient. I just told you, Marco.
Sensei is the world's best wisdom engine that we built together. We create AI digital replicas like this one you're chatting with now. Our technology preserves and shares individual and collective knowledge, helping entrepreneurs and organizations retain expertise, scale knowledge sharing, and ensure business continuity.
The whole vision is about building a future where our... We understand, we need to go through the whole piece.
So this is the, imagine that we have this technology where you can capture the memory, you can capture the identity, you can capture the thoughts, the views, the ideas, and then how do you represent it? So this is the video representation, so his voice, his image, with the second brain that we created, right?
So this would be like a chat widget on a website, same brain connected through a different channel. So that was a video channel and audio. Here is like a chat widget on a website.
So for example, we have book authors. They want to capture their ideas, their personality, their experience, why they wrote the book, the concepts around it. Tack, they create that.
And then one of the channels is the chat widget on the website. Imagine this would be a website, right? What do I have here?
Then we have the chat version. So here, there is the CEO of a company. And for example, they ask, what is your experience?
So it's the same as you heard with the voice, just the text. But I guess the difference here is that we also show all the information where it came from. This is what we were talking about the rug before, right?
So it's something called grounding. So the AI here doesn't talk about random stuff from the internet. It only talks about what you train it for, right?
So here then it shows the sources. And the way you train it is we have the backend somewhere here where, for example, the customer added the website, so here you see the, so what we did, we went, we load the website, we got all the text, we parsed it, optimized it, found all other related stuff that could be useful, right?
So in this way you create your own kind of second brain, right? Does this make sense what I'm talking about?
So imagine, so I loaded the website about my career or my expertise. So let's say on the website there was a career of Paul, the other person. But you ask, what is your expertise?
So this is where RAG comes from. RAG is Retrieval Augmented Generation.
Very fancy word, but basically, This is based on something called vector databases.
So the way this works is that it takes words and creates an array of numbers, like a matrix of numbers. And we use something called multi-language embeddings. So it transforms the human text in any language into numbers that are non-related to the original language.
So it can ask a question, ,, in Italian, but because the concept of work or expertise is unrelated to the language spoken, but it's related to numbers on a matrix. It can gather information in this way.
And the way this works is that it's about proximity. For example, if there was information about career, and I ask a question about work, career and work are quite close in the language space, so it's going to find this correlation and find information about my career, even if I'm asked, what is your work, right?
So this is why it's very powerful, because it creates a human way to interact with the replicas, with the LLMs, without being word by word, right? Because a normal database, I ask work, it needs too much work. Here, instead, it creates this proximity of concepts.
And the way we do it is multi-language, OK? It's pretty cool, actually.
But this is the next problem. So if I'm a human, and I'm like this, and you ask me, what is it? As a human, I know that in the cultural context, if you see me with something in my hand in front of my face, and you ask me, what is it? I know that you're going to be referring to this.
AIs don't know this. So you need to build systems to try to simulate the human behavior. So like in this case.
This is the author of a book, and I ask it what it is. Imagine this is on their website on a chat widget. It's going to tell me about the book, right? Because the default topic of the conversations should be the book. Because that's very human.
But for the AIs, they don't understand any of the human behavior. So that's why I'm saying we are simulating all of these by trying to understand these cues that make humans more capable of having a natural conversation and trying to simulate it with a machine that doesn't absolutely understand anything about how humans would like to interact in the subcontext of conversations.
Yeah, I think we... Let's see. Sorry, I need to... Apologies, I need to show you my back. Where are we? Maybe here? Yeah.
And then, for example, I don't know if you can see, but here is what is the price of Sensei. Sensei is our crypto token. It doesn't matter. The point is that this question requires me to get some live information happening right now.
So... You know, DLM, they train them once per year. So the price, like if you ask what is the weather in Geneva now,
That wouldn't work, because it doesn't know what is now, right? So actually, an AI doesn't know what is now. So I have to tell the AI, now is this time. So the way this works is something called tools.
You can create tools that is like a tool set for DLMs. You express, for example, this is a tool to get the weather. And you tell me the city and the time, maybe.
And then DLM says, ah. OK, in this case, let me check the current price of that or the current weather. And give me a second. OK, the current price is that. So the current weather is that.
So you need to create systems for it to access specific information. This is great.
For example, we had a customer asking, I would like to not have to use my humans, my people, to generate quotes for insuring financial stuff. So what can you make so that the artificial intelligence goes, talks to my systems, and then creates the quotes automatically. So that way you reduce 95% of the human time for those kind of trivial works, right?
So you can see you have a whole tool set of things. And this changes weekly, monthly. It's a constant evolution.
You see, we are all having gremlins here with the Ah!
There we go. Very good. Back on track. OK. So.
When I started to build the company, I had to come with some principles on how we can deal with all of this. So preservation first was one of the first things we agreed. So preservation first is that when you, for example, you load your thesis on my systems,
It needs to go through a whole series of steps. So I keep the original. I keep all the intermediate steps to the result because tomorrow the technology is going to change.
So I need to be able to reprocess the data. The team is going to become better with their skills on how to deal with this. The technology is going to be better. Or what happened to me?
We were using a technology that was one of the best in the industry. The company got acquired by another big company, and they shut it down. I had two weeks notice, and I had to reprocess all the data of all the customers. If I didn't have all the original, all the intermediate steps, would be very big troubles.
So adaptive architecture, the same. So as this changes, as we said, very, very regularly, you need to architect your software so you can throw away stuff, because they might become irrelevant or redundant or outdated very quickly.
And so two years ago, everybody was speaking about customer LMs. But that lasted very little, because the industry, they're investing hundreds of billions. And things are changing so fast that if you invested your energy making your own LM, Well, in like a month, two months, three months, it's going to be completely obsolete.
So huge waste of time. So luckily, we didn't do that, except for some experiments.
Pragmatic innovation. So because the industry changes all the time, if you focus and fossilize what you're doing now, you might be out of business very quickly.
So we do something called internal hackathons. every month where people in the team can just come up with crazy ideas for anything.
Because in that way, it's not just me trying to think of how is the best way to do it. I have the whole team. We are about 50 people that can come up with the great ideas.
Because they see posts, new technologies, announcements, YouTube videos, whatever, and there's always something new coming up. But you cannot always embrace everything new, because it can be cool for a week. It could be cool for a month.
So that's why I call this pragmatic innovation. You really need to make sure it's actually worth
taking on and then we have, we also have internally called a team called Labs where we experiment with proof of concepts. So we take new crazy ideas and instead of distracting the main focus of the company, we have a team that is quickly iterating and making experiments to see if actually, you know, because sometimes announcements are very grand but then the output is very little. So having a proof of concept that actually works makes a lot of sense.
Yeah, so don't just chase the hype, actually validate it.
A multimodal provider.
So, you know, at the beginning, we were only doing with chat GPT. And then we had some LAMA because it was cool open source kid in the block, which is another LLM.
But now we support everything, chat GPT, cloud, deep seek, everything. So you just pick and use whatever you want.
Because... Even if not officially, things change all the time. So sometimes you might be happy with how cloud represents you. Sometimes you might prefer chat GPT how it represents you.
So we had to be agnostic to that and just try to create tools so people can feel that they are happy with what is generated. And the same thing for the voice.
I didn't show you the demo with the voice. I'm going to show you the demo with the video. But there are two companies for this. One is called HN. One is called 11 Labs. HN does video. 11 Labs does voice. And they're the best.
So we're looking and thinking about implementing it ourselves. It would be, I don't know, billions of dollars of investment for, like, it's just not competition. So you need to embrace the industry where it's not your core value.
So infrastructure outsourcing, you can decide, do you want to spend the time of your people in building infrastructure, which is like servers to run your software, or do you want to focus on solving the business needs, customers' needs, create value for the users, right? So from the beginning, I did, we don't build infrastructure. We are so small as a startup at the beginning that it doesn't matter.
Paying one or two or three salaries for someone to run the servers would be so... a waste of opportunity. So we only use everything that is managed by someone else. So we don't have to do it ourselves.
We had servers in Germany because they were much cheaper. But paying the guy $10,000 per month to manage my servers just didn't make sense.
Yes.
And then this balanced open versus proprietary.
So I made a call.
So for example, all the technology that generates the second brand that we were talking about, the knowledge we call the wisdom engine, all the stuff around the knowledge management, the wisdom engine, that's really our strategic asset. So that is closed. We don't make it public.
Why, for example, the chat widget that goes on customers' websites, that should be open source because I don't get much value in keeping that secret because it just uses my API, my technology on the back end. But if the customer finds a bug and they want to have a team or they pay someone to fix it or change it, improve it, why not, right?
So what really makes a difference versus other things that are just accessory that you can just make it open?
Thus, we have a strong community we have now. 30,000, 40,000 people that are going around our socials and engaging.
So we can do bound prices. For example, improve our chat widget or come with new idea with chat widget. If that is open source, then they can create new cool things.
So we started running a public hackathon yesterday or the day before. It's going to run for three weeks.
So people from around the world can come and create cool stuff on top of our technology. So in this way, we don't have to try to invent everything. We can just let other people come up with cool ideas.
And this is the last principle, API first. So I got inspired by this by Amazon, actually. They have this idea that APIs are ways for computers to talk to each other through standard contracts.
This is what we sell as well. So people can buy access to our API. So they don't have to build all what we built already. They can just focus on their product market fit and delegate the technology to us.
And so I said to the team, we should not have any secret ways of doing things. It's all go through API. So what we can build ourselves, any customer can build on top of us as well.
So these were the kind of seven principles. It happened by chance there were seven. I didn't mean to do that.
Any questions on this? Do you have any of these steps, the security? I mean, some way to protect?
I put an example, which was coming to my mind. When you do the digital replicas, is it possible to distinguish that it is a digital replica from the real human? Like watermarking the video?
It's a very good idea. We are not doing it. So the way you can record your video is only if the human records its own video.
So only the real human can make their own video replica. So there are checks for that. The same thing for the voice.
A bit less. The voice is a bit, but the video, it goes through a process where it ensures that you are who you are representing with the video.
So there are no watermarks because you are the one making the video? Yeah, well, it was more in the idea of could this be abused, misused? Ah, for sure, yeah.
Pretty sure that there could be many ways, yes. But that's the interesting problem with the LMs, right? So what we say to the customers, don't put anything that you're not happy to be public on the replica, because it's not a way to put secrets or manage secrets, right?
That goes against, sorry, I don't want to take more time. That goes against the idea of your company, which is to make digital replicas of people who may be losing memory. So that means you go to something very personal.
No, so very good point, very good point. So for those replicas, we call them private replicas. And access is not to the public.
You only give to certain people. They need to give your email address. You have like a white list of people that can talk to you, like your family members. you wouldn't really want your own stuff to be in the internet.
So we have a very strict, it's either public or private. We don't do the halfway. We say, OK, I want to load my financial data, but only accessible to some people.
The LLM technology doesn't do that well. So it's either public, so you put public data. If you want a private, it needs to be private.
So it's very separated. We do it this way, because it's We cannot ensure security of private data otherwise.
I have a couple of questions.
One related to the time that it takes to train the model. For example, in that specific case for dementia patients, how long does it take?
And then the second is regarding the infrastructure. So just as a curiosity, you said that in Germany it was less expensive. Just curious.
Yes, so to run LLMs you need servers and normally when you use a host infrastructure like chgpt you pay by let's say number of words that they generate, while if you have your own servers you just pay a monthly fixed cost for the servers, right? And then as efficient as you are with the servers you have more or less words generated, let's say. That's why it was cheaper, because servers in Germany that can run LLMs are like this company called Hetzner.
It's $250 per month, I think. You need at least two, so $500 per month.
But I would say that, yeah, you can run LAMA, the latest LAMA, so you can have quite reasonable stuff. But you need a human to manage it. make sure it's secure, make sure it's running, make sure it's updated.
It's very expensive. So if you're a big corporation, it makes sense. If you're a small company like us, 50 people, having two people dedicated to that is not worth it.
Yeah, the training. So I would say we don't have a good user experience for the training of the dementia patients.
We are working on it. We need to do a voice-to-voice, so you can just speak. not really typing on the computer, should just speak.
So we have this concept of Athena, which is a way, I didn't show you, but it's a way to have a conversation. You decide a topic, let's talk about my childhood, what were childhood friends. So you have this conversation, but now it's text.
It doesn't work for patients. So I don't have a time, because They just didn't do it.
So what works really, what is very quick, for example, one of the person we work with is called David Orbach. He wrote a lot of books and he has a big presence on YouTube. So that's easy because I just loaded hundreds of YouTube videos, all the books, and then the replica knows everything, right?
So that was... no time, right? An assistant of the individual can do it, not even individual.
But for patients, we stopped that part of the business because we couldn't figure it out. So now we're working with external companies that are focused more on the late life. We call it legacy tech. So they might try to figure out the right way to create it, but it was a big distraction for us and we couldn't make it work.
transcripts yeah yeah so now there are new technologies coming up that also do visual analysis of videos but the current ones we do is transcripts which is generally good enough but we only do youtube now we you cannot upload in the future we are going to do the video files but we we are just doing youtube because most of the people who had youtube videos so how do you generate the transcripts some plugin on youtube No, so YouTube generates the transcripts for you, for public videos. So we just use that feature of YouTube.
Now Gemini 2 and Gemini 2.5, because it's the same company, they have special access to YouTube for that stuff as well. So we're actually rolling out a new version of the YouTube scraper to use Gemini to get information out of it.
But YouTube is not very happy when you just load the transcript. So you need to find a sneaky way around protections that YouTube put in place. So for example, if I do it for my server, after a few times, they'll say, no more transcripts for you.
I just wanted to ask, what's the accuracy of the responses that you obtain? So when you compare it to the actual human in real life, so for example, I think you mentioned the kind of, yeah, so how does it compare? So that's a great question.
So what I got the team to work on in this month is something called, we call it the Q&A. But basically, it's acceptance criteria for replicas. So we ask the owner of the replica,
If someone asks you, tell me 20 questions and answers on how you expect your replica to behave. And we test that daily. So all the work we do, all the evolutions that happen, we need to make sure that the expected quality matches what the owner of the replica was expecting to begin with.
Because we had the problem that maybe they were happy today. In a month, we do some changes. Things change, and they are not happy anymore. So we created this system.
Second, I showed earlier all the numbers at the bottom where it does the references, the sources. So we do something called hallucination checking. We score how correct is the answer generated based on the content of the source material that the customer provided. So we have a few ways for it.
And by default, we don't use the LLM's knowledge, only the knowledge provided by the customers. Because we work, for example, with authorities in Dubai. So if we ask, how do I set up a business in your local authority, the answer needs to be based on the policies and practices of that local authority, not generic stuff from the internet.
So it depends on how creative you want the replica. So there is this parameter called temperature. When you talk to the LLMs, you decide higher temperature means that the replica is more creative in the answers. Lower temperature, the more it sticks to the rules that you put for the replica.
So if you want to have very low hallucinations, you need to put very low temperature. If you want more creative, you can have more hallucinations. And then we have something called guardrails that is another LM checking the response of the first LM to make sure it stays on the track that was defined.
So we already talked about RUG, right? So maybe we don't need to go too much in detail.
But yeah, making this kind of proximity search of concepts to work well, you need to do a lot of trial and error for your specific use case.
Now there are some models, like Gemini 2.5. I think it has 2 million tokens. Normally, seven words is 10 tokens, approximately. So 2 million tokens is a lot of information.
But it could get lost. Without information, you can get lost. There is this thing called the needle in the haystack. So you try to find some specific information out of the two million things that you tell it, they struggle.
Because if it worked flawlessly, then you don't need RUG because you can just load everything. But that's going to be more expensive because you pay per token.
So RUG is still a useful technology in 2025. We'll see in the future.
Does anybody have any more questions about RAG? Because it's a very technical topic, so I don't know if I don't want to bore the audience.
No? OK.
So adaptability we talked about earlier. So things change weekly. So you need to figure out ways to do it.
For me, very important was the team structure. As I said, I have a team focused on the core work, the stuff maybe that we're going to build for six months a year. But then I have the labs team that is working on things that are experiments.
For some experiment, we hired agencies. For example, we did the Twitter integration, so your brain can be connected to Twitter, reply to people, send messages and so on. I hired an external company to do it.
It would have been a big distraction, so I was not confident that it's going to work. We're still not confident because sometimes they ban you if they see that kind of looks like AI is generating tweets. So I didn't want to distract my team from that, so we just hired an external agency.
Yeah, so preservation, first we discussed earlier, keep always the source material that you have so you can reprocess it. We already talked about the open source part. In our case, the dilemma is what is worth keeping proprietary versus what is good sharing with the world.
So for example, for us, the chat widget is a no-brainer. Everybody can look at it. The Telegram and Discord integrations, you know, your replicas can be connected to Telegram and Discord in the future.
Now, I saw on X, we're going to roll out on Slack and so on, right? So those integrations, they are not really a lot of proprietary intelligence because it's just how you connect your second brain to a chat. There's not much value there, but people could participate and improve it.
So it makes sense to be open. Where I want to protect my core intellectual property is the API. So the API, we have documented how you interact with it, but no one sees how it works inside.
Of course, the database with all the customer's data, that needs to be super secure. And how we do all the rug operations, how we ingest data, process data, clean up data, how we do this semantic compression of data and so on. So all of that is our secret sauce.
So I think this is where I evaluated is our company's really mode, while the rest is just things that we have to do, but doesn't really create so much value. So it makes sense to keep it open.
Do you want to know about the architecture? It's a bit technical, maybe.
Yeah, the whole point here is that the end users can talk to a website, to a chat widget, to a Telegram integration, to the Discord, whatever, and all goes through the API. The API is the heart of the business. And then everything else also talks to the API.
So there is this single way that our customers can also create technology on top of it that we use as well. Here it's like how we generate voice, video, how we generate text, and how we do all the processing for the rack systems. But all talks to the same API.
So what we can do, a customer can do. So all what you sell, video, text, image, telegram, chat widget, so on, any customer can build on top. So they can build it for their specific business needs.
OK.
Last slide, if you have any questions.