Pragmatic IA for SMEs

Good evening, everyone. I'm James McGill, co-founder of Kettle.

So you can see by the height of the people in front of you that we are more nerds than businessmen and marketers. But tonight, we're going to We will avoid pitching our company. We'll show you our products.

But we are kind of AI veterans here, because I trained my first neural network when I'm in 2006. Nico studied it when it was already much more developed at the EPFL. And we knew large language models when they were small, like that. So we saw the full growth of AI.

So yeah, don't hesitate to interrupt us if we start rambling too much about stuff. So I'm Nicolas, also a co-founder of Kettle. I, as James said, I computer science degree from EPFL, worked on AI right when I started my master thesis, did a lot of data science because that was kind of hyped already at the time.

But I always wanted to work in startups. So I worked in a startup which worked with big banks on big projects. And I felt that SMBs were maybe more interesting and a little bit more flexible, but didn't have the actual tools to work with documents and AI, especially. And so we co-founded Kettle in 2018.

just after I got my master's degree. And we worked on AI for SMBs, especially for based on documents, for a document solution. And so we built kind of a GED or DMS in English for, especially for lawyers because that's where we started.

And they were using, they use quite a lot of documents. And our first kind of idea for the product was, is it going to show up? Yes. Our first kind of product idea was, okay, so you receive a lot of documents by mail.

And then what you want to do is scan them and send them to the right lawyer. That's what a lot of companies start doing.

So you can see where we chose a completely different direction than the tool Alexis showed before. We basically, instead of going for the big bucks with the big companies, we just basically put a hat on and went to the lawyers, to companies that are very tech-averse in general and risk-averse, paper-based. So yeah. A big challenge. And so what we said first was kind of simple. You receive a document like this email.

You say, OK, so it's a new scan. We detect everything inside the document. So here there's a scan. So we extract the content of it. We understand the content. And we extract information.

And that's all based on technologies that aren't LLMs, because it was 2018 and they didn't exist yet. And what we offered was a pretty simple thing that was looking at the data and then saying, OK, so this is a judgment. And so I'm going to move the file to the right place in your SharePoint. And that's pretty much where we started. So that's what people call an agent nowadays. So then you save the file. And it kind of goes away. And you can continue your day, continue working on other stuff. Then we went a bit further and we created this platform, which is kind of more based on research and search paradigms.

And so you can find here that the judgment is stored in the right place that I showed before. with a name that's correctly put, so the date is as the first part of the file name, you have the type of document, and the actual tribunal that's mentioned in the document. In here you have all the information you need to work on the document, so everything that's extracted is over there and everything else. So that's where we started. And then LLM happened, and we said, okay, so what can we do with LLMs? And that's where you can see we went with the nice intelligence logo and said, okay, so now, oh, it's in French, whatever.

And then we said, okay, so now what we can do, which is really smart, is you can ask questions about the documents without actually reading the documents and understand what it's talking about. So here, for example, And just for the context as well, we are not targeting users that are on ChatGPT already. They need an environment that is safe, that is kind of automated. Everything is plugged in with their old Outlook, with their own file tree in SharePoint. And everything needs to be safe as well, and preferably located in Switzerland and not sending data to you don't know what kind of API somewhere.

which is kind of a risk if you're using any tool built by startups that depend on other technologies. Here, everything was developed from scratch, at least the AI model that extracts the information. But this part, with the generative AI, we just use off-the-shelf LLMs provided by Infomaniac in Switzerland. And they don't keep anything, which is probably an issue for them at some point. But for us, it's very convenient. I guess these tools also get information from outside and are updated, even though they are locked somehow, right?

CHET HAASE- Kind of, because when you use LLM through an API, if it's not ChatGPT, usually it's just an open source model that is basically static. But they get updated over time. So they get better and have more recent knowledge. But they are not tuned to use specific use. Maybe I'll just jump a bit in the discussion, but also why AI is so prevalent nowadays. In my view, there are two main factors, three, probably more. First, cost. Now you have all the hard way to run it efficiently. But one of the things was that in the past, like five years ago, you need to build the proper data set before you train the model so you get good results. And those models were usually very specialized.

So once you train the model for certain amounts of dollars or francs, it would perform very poorly on data that it has never seen. And nowadays, you have ChatGPT. Just ask whatever the question. It will give you an answer that is at least more or less OK. And you can ask it a variety of tasks that are actually useful. If you can train a model to recognize if it's a cat or a dog, yeah, it's cool. It's really useless. Being able to interact with the AI is what makes it very valuable. And that's the last point, I think, where it really makes sense.

When you have an interface like we've shown before, where cold AI extracts information and so on, this is very nerdy. 60-year-old people that are handling papers and so on, you show it to them, they're like, hmm, I don't trust it, and so on. It's very difficult to make them believe that it's useful. And it's very nerdy. It looks nerdy as well. It's text boxes with data. We love it, but people, the simpler the better. But if you can interact with a system just by chatting, anybody can use it.

So that's why I think GPT worked as much. It's not that useful if you want to optimize processes and so on by itself. But you can already play with it. Anyone can use it. And the real value, and actually that's why I think we're still here, is in integrating AI systems together. And that's why now they're talking about agents and so on. Yeah, it's cool if you have a very nice LLM, but it runs in its own bubble. So yeah, our goal here is to have kind of a, it's not a crazy system with thousands of agents that can do anything.

It's really specialized systems that we built to help some specific types of users to do their job. And even to add on that, one of the interesting part is when you ask questions like these, it's actually a lot of work. We've tried a few things in terms of optimizing how users interact. Here, for example, if you're looking for a specific client, you can search and then you can have all these filters, but it's too much work. And writing a full sentence to actually get an answer is sometimes it's great because you're doing research and it's intensive work, but sometimes you actually want the answer right away and you don't want to do that work in addition for you to get the answer. We're trying to find this balance between some things that's really good and has the answer you need, but also some things that's very quick to get your answer. And that's, I think, where the challenge lies.

For example, you could ask here to the LLM kind of to ask, translate me the document. But that's annoying, right? It's like is quite a long phrase to type in anything. So what you have here is you could say here, translate it in German with two clicks, and then you have it in German. That's much easier to kind of use. And that's where I think what we have with this advance in computer science where you have LMs is, how do you make it useful for users in all these kind of different aspects? And that's kind of the challenges we have.

It always reminds me when in a previous job in a bank, I was working with people working on the inputting data in the core banking system. And it was a system probably designed in the 70s or 80s. And there was a lady close to retirement. She was extremely fast with it. It was not a chat because too many characters. She knew all the functions that allow you to do the thing in like four characters. There were shortcuts like F081, boom. entered the data, it worked.

And I think this is why it's important to have both, because you have power users that will be faster than AI. And users that don't want to deal with technology, they will just want to ask the question or even talk to the computer even better. And I think this is the direction we'll be going, simple tools for the simple users and efficient tools for the power users. A last feature I can show on the application is here we have automatic transcription.

What's nice is you have automatic transcription, but then you can also take the transcription and translate it or ask questions or summarize it. So you have all these kind of steps that happen. And so we're doing it this way right now because it's a first, but the idea is to really make that a full-featured application that's fast and very easy to use. And what's important is it's not a one-shot thing. The text is indexed into the search engine.

So you can find your MP3s and everything after having transcripted them. One of the few things that we can show still, maybe, PDF. Maybe try to use this probably if you type merge. I think I have it here. Oh no, this one's already split. Yeah, tap Merge there.

It should be fine. It's not in here anymore. Yeah, one of the things we, yeah, last one. There are also simple tasks that can be improved with AI if you have the right interface. One of the cool things we built here is a system where you open a PDF.

So it's a scan of five documents. So people doing the scanning in the morning and so on, they just need to scan one block of documents. It was real time, by the way. We analyzed all the pages and tried to find basically the transitions between documents. And we highlight them with colors, so it's

really, really easy to understand and use. And you can just move around the pages if you want to reorder them. It's super simple. And once you're happy with the results, you can just click on Create. And it will just create all the documents and then run them in the pipeline. So every document will be tagged with the client, the project, the type of documents, the date, and so on. So it's pretty sweet. Any questions regarding what we showed or said? Remarks?

How do you handle hallucinations, especially with large context windows? Very technical questions. So we would say the idea behind, we haven't had that actual issue right now because we send the document and you generally have one question. You don't have a chat or anything. That already prevents quite a bit any association you could have. Also, you have kind of prompt techniques that you can use to actually limit them at a maximum. But our idea for that, as always, is to give him as much context and as much information as the model can ingest to actually give an answer. And that limits completely hallucination because he has the information in the context window.

You have hallucination mainly, what we've seen actually at that time is hallucination, you have them only when he doesn't have enough context and he tries to invent because he doesn't have the information. There is more technical depth if you want to go into it maybe afterwards. But in most cases, that's how we actually tackle the problem today. Also, we avoid it by not using LLMs when they are not needed. So lots of things we run are just basic neural networks, very old school, very cheap.

They can run on CPUs, and they are reliable. And the non-LLM models are trained online on the client data. So basically, any bias they can have, they are viewed well by the users, because it's exactly what they expect, because that's what they inputted before.

Quickly to Chopin, but I think maybe one of your main challenges is around the privacy and how do you convince especially lawyers and law firms that actually the data you're going to use is only yours and not using any, especially with all the limit scanners. Yeah, so we never start answering by trust us. Now we have contracts in place that we've paid other lawyers to review and so on.

Internally, we're also very careful about how we handle the data. Basically, you have only two persons here. Actually, they're here.

In case of a plane crash, we're screwed. Basically, we both work in Swiss banks. So like, OK, we'll close the eyes.

Don't look at the data unless the clients ask us to fix something or this kind of stuff. But it's also, yeah, the Swiss brand helps a lot, I think. And the fact that we're not a startup that exists for

like a couple months or weeks, we had to gain the trust of our clients. And then after that, it was fine. But it was hard in the beginning, also because the general public was not aware of the risks.

And they didn't understand anything about data storage and so on. At the time, in 2018, 2019, Microsoft did not keep data in Switzerland. or not every time.

So it was already kind of a difficult discussion. We lost a few leads by saying that we worked with Microsoft, for example. Now it's a bit better, but with the trend with America and stuff, I'm guessing we're going to store more and more data in Switzerland and Europe and use different services as well.

the stack that the client is using. So if they're using Microsoft and they're hosting everything in the US, how are you going to be able Actually, it's not really our problem. It's a bold answer.

Because we have the platform that just index the documents and handle it. If the client wants to have his data with Microsoft, it's his data. We don't have any control over it. If they want to plug something that is Swiss or European, we just need to develop the connectors for them. Any other?

GABRIEL SANCHEZ- I think, do we have some more time? You have as long as the pizza's about to be outside. GABRIEL SANCHEZ- OK. But you don't have battery. GABRIEL SANCHEZ- Yeah, it's going to shut down soon.

Exactly. GABRIEL SANCHEZ- I was quite curious as well. So I guess most of you have used ChatGPT and so on. Anyone has used any AI tool that is not ChatGPT at his work? 1, 2, 3, 4.

OK, lots. OK. What kind of tools were there?

Me, personally? Yeah. Cursor.io. OK. Coding platform. Yeah. Cool. And you? Me, I use different LLM, depending on the usage.

And also like Sumo or 11 Labs for creative content. OK. for example, for presentations, to make presentations. Okay, I think we saw that one. Yeah, we saw that one. DeepSeek. DeepSeek. Yeah. I'm not sure it's a super small software. Hex, which is a BI tool.

i think i saw that one yeah and basically it was it's you have the big one a looker tableau of this great world and then you have rex which has super nice ui but started as a kind of a concept like the same product is basically you can prompt the data that you have so the data needs to be modeled obviously but you can prompt it and it's make all the It helps in the transition of the data cell servants. And I think I can see the same business case, business issues that they might have, especially as two days ago, I sent an email about the terms and conditions, especially around the data that they're hosting, because they need access to the data that we own. So that's one of the uses that they use. And I see the comparison. FRANCOIS CHOLLET- But it's funny, because it's really all of your use cases are very much centered on gen AI. Really, it's about creating stuff, helping you to do stuff, and very much about creation. It's code, content, and so on. Not really process automation.

And I think this is where the next thing will happen. I've used u.com, that is for sales manager and deep generation and all that, but without any integration. the issue. At some point, you have to copy-paste the stuff into an Excel. And then maybe if you have database or something. I'm not that old, but work hasn't changed since early 2000. Email, PDFs, all the stuff. You have lots of things around telling you it's the future of work and so on. It's still very much in the beginning.

It will take, I don't know, 20 years so we get out of paper maybe. But I'm not maybe the most optimistic about that. We are biased. We're working with lawyers and these kind of people. I hope there are no lawyers here.

It's more kind of we started there, so right now we're still here. But we actually started working with insurance brokers, fiduciaries, which is kind of banking, but it's a little bit different, asset managers. So every company that's very paper-based or very document-based, that's where we have most of our value today. They are like, if you're a kind of software development company, well, you're using code. So it's not really something we could apply directly.

So that's not where we're going to go. But everything that's using documents is kind of where we can help. Yeah. Understanding documents will really, yeah. any other business case that is not paper, physical paper, but it's more of the process, like as a small shop with their own cooking system or something like that.

FRANCOIS CHOLLET- So what we've seen is that for smaller companies, having a product like this one with a web app and something that is quite easy to use, it's interesting. But it requires reaching a certain scale. For more specific tools, there's also a huge market using basically the same technologies. So we have banks, administrations, cantons reaching out because they have issues with their They're dealing with paper, and they have old IT systems.

And at some point, everybody's telling them, ah, we should have digital documents and so on. And they've been trying to do it for 15 years, 10 years. And they realized that on the market, if you can just have a little brick, you can put somewhere that helps you to do the bridge. between having paper, to have properly organized metadata into a database somewhere, that has a lot of value. So currently we are prototyping with Kenton, where they just need to have a system that processes the scans, detects some information, attributes it to a taxpayer, so very sensitive data here, and then it's off.

for us, because they don't need an interface. They just need something they can easily put within their existing systems. So it's a very different business, actually. Yep? You talked the whole time about scanning data.

It's an obsession of ours. Sorry about that. Does the OCR really work that reliably all the time that you can actually rely on that? Or is that an issue? The AI is still not good enough. So let's say OCR works very well. Actually, it's probably only this year where LLMs are matching proper OCR systems.

Maybe fight me if you have a different opinion. But at least they don't hallucinate the old school systems. They might miss a letter. They want to invent a word, which is quite important. And we have tricks as well, because the models we use are not sensitive that much to the input data.

So secret sauce. One of the tests we did is we had these kind of cards, you know, for your drivers, for when you have your car. La carte grise. And these documents are like hugely complicated with OCR because first of all, they're gray with black text. So that's already quite an annoyance.

But then you have text that's kind of dispatched in many different places. And you have, sometimes you have two letters and then a number. And so you don't know if it's a real text or not. And when you do OCR on these, even old OCR is actually quite good at detecting letters and putting spaces where you need spaces so that it kind of remains structured. And before, what was difficult is when you had that, you needed to do kind of regexes where you say, OK, I want two letters and then a number, and that's a license plate.

And that's super difficult to do with regexes. But what happens with LLMs is they actually understand what the spacing means. And it is able to then imply where the license plate is. So it's actually, even if the OCR is maybe not perfect, but the addition of OCR on images and LLMs to understand that OCR, you actually have a very, very good result. So that's kind of one of the interesting things we discovered. And our guess is that the LLMs were trained on poorly extracted texts done by OCR, so it learned to interpret all the space in the structure. So it's pretty funny. That's why if you input a badly formatted table and so on, it works pretty well if you put it in ChatGPT or something. It's impressive. We don't do that here.

Yeah, that's a good point. We don't handle those very well. I think the single thing we do is try to detect where there are handwritten texts, because it's probably a signature or so on. So that's useful to know, OK, this document is signed or not. And that's where also like multimodal LLM models will be quite good because you can send an image and then it actually understands the image.

But right now, like OCR technology on handwritten text is definitely not as good as print OCR. That's for sure. LUDOVIC BLECHER- The way it works is when they do it live, for example, if you have those tablets remarkable or something, because they have also the context. They know how much pressure you put, the speed, and so on. So they have more data to properly recognize the letters.

Handwritten vs Print OCR

And I think this is the direction we'll be going, simple tools for the simple users and efficient tools for the power users. A last feature I can show on the application is here we have automatic transcription. What's nice is you have automatic transcription, but then you can also take the transcription and translate it or ask questions or summarize it.

So you have all these kind of steps that happen. And so we're doing it this way right now because it's a first, but the idea is to really make that a full-featured application that's fast and very easy to use. And what's important is it's not a one-shot thing.

The text is indexed into the search engine. So you can find your MP3s and everything after having transcripted them. One of the few things that we can show still, maybe, PDF.

Maybe try to use this probably if you type merge. I think I have it here. Oh no, this one's already split.

Yeah, tap Merge there. It should be fine. It's not in here anymore.

Yeah, one of the things we, yeah, last one. There are also simple tasks that can be improved with AI if you have the right interface. One of the cool things we built here is a system where you open a PDF.

So it's a scan of five documents. So people doing the scanning in the morning and so on, they just need to scan one block of documents. It was real time, by the way.

We analyzed all the pages and tried to find basically the transitions between documents. And we highlight them with colors, so it's really, really easy to understand and use.

And you can just move around the pages if you want to reorder them. It's super simple. And once you're happy with the results, you can just click on Create. And it will just create all the documents and then run them in the pipeline.

So every document will be tagged with the client, the project, the type of documents, the date, and so on. So it's pretty sweet. Any questions regarding what we showed or said?

Remarks? How do you handle hallucinations, especially with large context windows? Very technical questions.

So we would say the idea behind, we haven't had that actual issue right now because we send the document and you generally have one question. You don't have a chat or anything. That already prevents quite a bit any association you could have.

Also, you have kind of prompt techniques that you can use to actually limit them at a maximum. But our idea for that, as always, is to give him as much context and as much information as the model can ingest to actually give an answer. And that limits completely hallucination because he has the information in the context window.

You have hallucination mainly, what we've seen actually at that time is hallucination, you have them only when he doesn't have enough context and he tries to invent because he doesn't have the information. There is more technical depth if you want to go into it maybe afterwards. But in most cases, that's how we actually tackle the problem today.

Also, we avoid it by not using LLMs when they are not needed. So lots of things we run are just basic neural networks, very old school, very cheap. They can run on CPUs, and they are reliable.

And the non-LLM models are trained online on the client data. So basically, any bias they can have, they are viewed well by the users, because it's exactly what they expect, because that's what they inputted before. Quickly to Chopin, but I think maybe one of your main challenges is around the privacy and how do you convince especially lawyers and law firms that actually the data you're going to use is only yours and not using any, especially with all the limit scanners.

Yeah, so we never start answering by trust us. Now we have contracts in place that we've paid other lawyers to review and so on. Internally, we're also very careful about how we handle the data.

Basically, you have only two persons here. Actually, they're here. In case of a plane crash, we're screwed.

Basically, we both work in Swiss banks. So like, OK, we'll close the eyes. Don't look at the data unless the clients ask us to fix something or this kind of stuff.

But it's also, yeah, the Swiss brand helps a lot, I think. And the fact that we're not a startup that exists for like a couple months or weeks, we had to gain the trust of our clients.

And then after that, it was fine. But it was hard in the beginning, also because the general public was not aware of the risks. And they didn't understand anything about data storage and so on.

At the time, in 2018, 2019, Microsoft did not keep data in Switzerland. or not every time. So it was already kind of a difficult discussion.

We lost a few leads by saying that we worked with Microsoft, for example. Now it's a bit better, but with the trend with America and stuff, I'm guessing we're going to store more and more data in Switzerland and Europe and use different services as well. the stack that the client is using.

So if they're using Microsoft and they're hosting everything in the US, how are you going to be able Actually, it's not really our problem. It's a bold answer.

Because we have the platform that just index the documents and handle it. If the client wants to have his data with Microsoft, it's his data. We don't have any control over it.

If they want to plug something that is Swiss or European, we just need to develop the connectors for them. Any other? GABRIEL SANCHEZ- I think, do we have some more time?

You have as long as the pizza's about to be outside. GABRIEL SANCHEZ- OK. But you don't have battery.

GABRIEL SANCHEZ- Yeah, it's going to shut down soon. Exactly. GABRIEL SANCHEZ- I was quite curious as well.

So I guess most of you have used ChatGPT and so on. Anyone has used any AI tool that is not ChatGPT at his work? 1, 2, 3, 4.

OK, lots. OK. What kind of tools were there?

Me, personally? Yeah. Cursor.io.

OK. Coding platform. Yeah.

Cool. And you? Me, I use different LLM, depending on the usage.

And also like Sumo or 11 Labs for creative content. OK. for example, for presentations, to make presentations.

Okay, I think we saw that one. Yeah, we saw that one. DeepSeek.

DeepSeek. Yeah. I'm not sure it's a super small software.

Hex, which is a BI tool. i think i saw that one yeah and basically it was it's you have the big one a looker tableau of this great world and then you have rex which has super nice ui but started as a kind of a concept like the same product is basically you can prompt the data that you have so the data needs to be modeled obviously but you can prompt it and it's make all the It helps in the transition of the data cell servants.

And I think I can see the same business case, business issues that they might have, especially as two days ago, I sent an email about the terms and conditions, especially around the data that they're hosting, because they need access to the data that we own. So that's one of the uses that they use. And I see the comparison.

FRANCOIS CHOLLET- But it's funny, because it's really all of your use cases are very much centered on gen AI. Really, it's about creating stuff, helping you to do stuff, and very much about creation. It's code, content, and so on.

Not really process automation. And I think this is where the next thing will happen. I've used u.com, that is for sales manager and deep generation and all that, but without any integration.

the issue. At some point, you have to copy-paste the stuff into an Excel. And then maybe if you have database or something.

I'm not that old, but work hasn't changed since early 2000. Email, PDFs, all the stuff. You have lots of things around telling you it's the future of work and so on.

It's still very much in the beginning. It will take, I don't know, 20 years so we get out of paper maybe. But I'm not maybe the most optimistic about that.

We are biased. We're working with lawyers and these kind of people. I hope there are no lawyers here.

It's more kind of we started there, so right now we're still here. But we actually started working with insurance brokers, fiduciaries, which is kind of banking, but it's a little bit different, asset managers. So every company that's very paper-based or very document-based, that's where we have most of our value today.

They are like, if you're a kind of software development company, well, you're using code. So it's not really something we could apply directly. So that's not where we're going to go.

But everything that's using documents is kind of where we can help. Yeah. Understanding documents will really, yeah.

any other business case that is not paper, physical paper, but it's more of the process, like as a small shop with their own cooking system or something like that. FRANCOIS CHOLLET- So what we've seen is that for smaller companies, having a product like this one with a web app and something that is quite easy to use, it's interesting. But it requires reaching a certain scale.

For more specific tools, there's also a huge market using basically the same technologies. So we have banks, administrations, cantons reaching out because they have issues with their They're dealing with paper, and they have old IT systems.

And at some point, everybody's telling them, ah, we should have digital documents and so on. And they've been trying to do it for 15 years, 10 years. And they realized that on the market, if you can just have a little brick, you can put somewhere that helps you to do the bridge.

between having paper, to have properly organized metadata into a database somewhere, that has a lot of value. So currently we are prototyping with Kenton, where they just need to have a system that processes the scans, detects some information, attributes it to a taxpayer, so very sensitive data here, and then it's off. for us, because they don't need an interface.

They just need something they can easily put within their existing systems. So it's a very different business, actually. Yep?

You talked the whole time about scanning data. It's an obsession of ours. Sorry about that.

Does the OCR really work that reliably all the time that you can actually rely on that? Or is that an issue? The AI is still not good enough.

So let's say OCR works very well. Actually, it's probably only this year where LLMs are matching proper OCR systems. Maybe fight me if you have a different opinion.

But at least they don't hallucinate the old school systems. They might miss a letter. They want to invent a word, which is quite important.

And we have tricks as well, because the models we use are not sensitive that much to the input data. So secret sauce. One of the tests we did is we had these kind of cards, you know, for your drivers, for when you have your car.

La carte grise. And these documents are like hugely complicated with OCR because first of all, they're gray with black text. So that's already quite an annoyance.

But then you have text that's kind of dispatched in many different places. And you have, sometimes you have two letters and then a number. And so you don't know if it's a real text or not.

And when you do OCR on these, even old OCR is actually quite good at detecting letters and putting spaces where you need spaces so that it kind of remains structured. And before, what was difficult is when you had that, you needed to do kind of regexes where you say, OK, I want two letters and then a number, and that's a license plate. And that's super difficult to do with regexes.

But what happens with LLMs is they actually understand what the spacing means. And it is able to then imply where the license plate is. So it's actually, even if the OCR is maybe not perfect, but the addition of OCR on images and LLMs to understand that OCR, you actually have a very, very good result.

So that's kind of one of the interesting things we discovered. And our guess is that the LLMs were trained on poorly extracted texts done by OCR, so it learned to interpret all the space in the structure. So it's pretty funny.

That's why if you input a badly formatted table and so on, it works pretty well if you put it in ChatGPT or something. It's impressive. We don't do that here.

Yeah, that's a good point. We don't handle those very well. I think the single thing we do is try to detect where there are handwritten texts, because it's probably a signature or so on.

So that's useful to know, OK, this document is signed or not. And that's where also like multimodal LLM models will be quite good because you can send an image and then it actually understands the image. But right now, like OCR technology on handwritten text is definitely not as good as print OCR.

That's for sure. LUDOVIC BLECHER- The way it works is when they do it live, for example, if you have those tablets remarkable or something, because they have also the context. They know how much pressure you put, the speed, and so on.

All right. LUDOVIC BLECHER- Right? Well, I mean, no. I mean, if they have more questions, they can keep going. I'm just saying.

We can also have them around the beer. Exactly. They're still there, I suppose.

So you can, actually, firstly, I want to thank Nicholas and James. I think it was a great intervention, a great Q&A session, so we can applaud them. Thank you.

Finished reading?