What I Learned From Building Ezri AI

Introduction

So my name is Gustav Grimberg, and I will give the first presentation, as Joshua said. And thanks a lot to Frank and Joshua for putting this on. It's great to see that so many people crossed the terrain to come here.

Presentation Overview

And the talk that I'll give is called what I learned from building with LLMs. And I will speak from an engineering perspective, so how to actually build these applications software-wise.

And if you have any questions during the presentation, please let me know, and I will be very happy to address them.

Speaker Background

So just first a bit about me, so who I am, what my background is, and why you should listen to me.

I have a background in human-computer interaction, and I'm specialized in education. I studied here in Paris, I did research here in Paris too at École Normale Supérieure and at INRIA, which is the leading French national research institute in computer science.

And there I had a great fortune to interact with a lot of very good researchers in machine learning and human-computer interaction, and I ended up in a group at the end that did human-computer interaction research in education. And this was many years ago, and I've been very involved with these kind of developments for a long time.

Professional Journey

And then about two years ago, there came this big breakthrough in how people perceived of how to interact with computers, and this was a very good point for me to try and launch some of the applications that I've developed, which has now turned into Esri, that I've been working on for one year. It's a company where we do AI tutors, we develop the technology to power AI tutors.

We usually collaborate with established learning platforms to deliver the interactional technologies for them to integrate into their platforms.

So these platforms that we develop with our partners, they have a lot of videos, a lot of exercises, a lot of articles, and the students can then interact with our AI tutors as they go through the content of the platform.

We have launched our first tutors in Germany, and we're very excited to see how it develops and how it unfolds.

Live Demonstration

So I'll actually just jump straight into the demo. I'll do it with one hand, I guess, because I will have the microphone on the other hand.

So the platform that we have collaborated in the first place is a platform called Do The Learner Tech. It's a German platform, and they have specialized in K-12 education for German students.

They have a lot of content in biology and mathematics, as you can see, that consists of videos, exercises, and so on. This is an example of the biology display.

So the first thing the students see, so this is our first generation of the tutors I said. There are a lot of things that are very interesting to work on, but I think it's a very good sort of introduction and case study to look at to actually understand a bit about the mechanics of these applications.

So the first thing the student sees when they log in is the AI tutor. down here in the bottom right corner. It's in German, but I will translate as we go. And it says, hi, I'm Kim, is there a theme I can help you with?

This has been selling since March. It's a very early productionized application in the EdTech AI industry. We were some of the first to do this in Germany. And since we launched it, we have learned a lot about how to develop these applications and how to actually cater to the needs of these students the best.

Building an AI Tutor

Yeah, so just to give a brief overview of what it actually takes to build an application like this. So there are a lot of different components that go into it.

Language Models

So first you have a language model. And many of you probably have some kind of familiarity with what a language model is.

You know, many of you have probably used or something like that, Copilot. And there are different providers of these language models.

There's OpenAI, a very famous one. There's Anthropic, another very famous one.

But also here in France, for instance, Mistral is a very good technical company that are doing a lot of interesting things. These language models have different properties.

Some are good at some things. Some are good at others. Some are small. Some are big.

But at some point in these applications, there's a language model doing

Embedding Models

a lot of the work. Then you have embedding models.

This is something that's very useful for searching through content fast. So for instance, when we search through content to answer a question in mathematics or in physics, we use embedding models.

Then you need some kind of specialized databases for storing these embeddings. There are a lot of different options, open source, closed source, depending on your needs.

Monitoring and Evaluation

Then a very important feature, which I'll also talk a bit about after this, is monitoring and evaluation. So actually understanding how these language models behave in practice, how people are interacting with them, is something that's really important ensuring that you're deploying something that is responsible and that's useful.

Key Takeaways

OK, and then just to wrap up or summarize some of the experiences that I've developed by deploying language model applications, I think the most important thing to have in mind when building this application is the user experience. So there's a completely new paradigm in how you integrate computers. And I think it's very important to think about how exactly to design these applications so that they are as frictionless as possible to use.

When we deployed this the first time in March, it was very long answers that were quite complicated. Language-wise, way too complicated for the kids that were using it. We saw that this was not a good user experience.

It would be much simpler answers, much shorter answers. The reason, or what drove also the way our WhatsApp application was also we saw that users, they really asked for having audio models and image capabilities too, and that they are not really that willing to change their workflows. 1So how to integrate these AI applications into users' existing workflows is something that is very important, and I think that is super crucial to have in mind when deploying language model applications.

User Experience

Secondly, it's monitoring. So seeing what kind of user patterns the students they have, or the users they have. So in this case, they engage with the AI tool to what kind of questions they ask, when they ask questions, and what they usually have difficulties understanding.

And this will then drive continuous improvement. So you can see that a lot of students, they probably only ask two questions in the beginning because they don't know what to ask.

And then, okay, sure, you should provide this suggested question feature where you help them actually ask the right questions.

Continuous Improvement

1So evaluating the applications before they go into production is something that can really help you catch a lot of issues beforehand. So if some of you work in engineering or in software technology companies already, you know that you don't just write code and push it to your production server. You have a lot of testing beforehand.

You have unit tests. You have quality assurance. And you should have a similar mindset when it comes to deploying language model applications.

So you should figure out how to actually test the language models as best possible. And this is something that can be done with test sets.

So you can come up with a lot of questions that you think that students would ask. You can come up with what answers you believe the language model should give. And then you can run these evaluations before you deploy them.

And this is something that we always update when we see how the users actually use the system. So we model the end application best possible before we deploy it.

Conclusion

Okay, yeah, so that was a brief overview of what I learned with building with language applications. And feel free to, yeah, send me an email or hit me on LinkedIn if you have any questions. Thank you.

Finished reading?