What We Learned Building With LLMs For TalentLMS

Thank you.

Introduction

So a lot of familiar faces, but for those who don't know me, my name is Yannis Rizos. I'm the chief software architect here at Epignosis. This is actually where I work, where I spend my days.

Thank you, Stefano, for the opportunity to talk about the lessons we learned building AI features for talent LMS.

The Focus of the Talk

And although this is a meetup that has practical AI in the title, unfortunately, and apologies about that, my talk is not going to be very practical.

What I've done instead is I'm going to talk about the town planner's perspective. This is an archetype, an analogy I'm borrowing from Simon Wardley.

for the more technical people. We have explorers, we have villagers, and then we have town planners.

My role at Pygnosis is a town planner, so what I'm going to share is mostly things from a high-level architectural perspective and a little bit from the product perspective.

Challenges in AI Development for Talent LMS

So because I'm going to be talking a lot about scale, for those unfamiliar with Talent LMS, Talent LMS is a learning platform that serves roughly about 25 million users globally. And that poses unique challenges in, I'm going to say, normal development, which are very, very, very more special when it comes to LLMs.

So I'm going to start with a spoiler. This is actually, the first slide is actually everything you need to know about the presentation.

So the key lesson that we've learned is that AI itself is not a product, at least for us. It's just one component of a much, much larger and much more complicated system.

I'm mentioning this as the first thing and it's up in front because AI is magic, right? It's very alluring, it's new, it's exciting.

But when you're building a product like Talent LMS, it's just another component of the system. It's not the system itself. And well, I'm actually making to make that a little bit more specific.

Our users come to us to interact with an LMS and AI for us is just the cherry on top.

More technical difficulties. Sorry about that.

So the second key insight for us is that working with LLMs is not really something that special. There's the 5% that it's actually model behavior and it's very exciting and it's very, very, very bizarre at times.

But 95% of the work is actually good old school engineering. There's nothing magic about it.

The same rules that apply to all distributed systems apply to LLMs as well. and well that sounds like a slogan but it's actually very very very true at our scale when you're serving 25 million users you cannot just experiment and hope for the best you will need to build robust solutions

What We Have Built

So, just a quick, a very quick overview of what we have actually built, what we have on production right now. A lot of people here are from a beginner's history about that.

You already know about all the features. And of course, most of these features are not just in Talent LMS. They are also in TalentCraft, they are also in Ifron, there are other products, but my focus is Talent LMS.

So, we have a completely AI-powered authoring tool. We call it TalentCraft. When we first saw it internally, it did feel like magic.

You give it a prompt and I'm gonna showcase it later on. You give the prompt and you get a full unit in seconds, right?

That's not enough for us. We have built several AI tools into our more standard, let's say more basic course creation and editing flows.

As you can imagine, we can generate course titles, we can generate course descriptions, we can generate images, we can generate everything, right? Dimitris mentioned ChartGPT, everything you use ChartGPT for, we use it in a much more controlled environment for our instructors to save them time.

One of our most impressive features, at least to me as an architect, is the course translation. We have Dutch Live on production, you have a course, you click a button, you select the languages and a couple of minutes later you have the same course in every language you selected.

right and we do keep and you can test it yourself we do keep the same exact educational value of the original it's not just a translation let's say we don't just pass it to google translate and whatever it takes out we give to the users we also have skills sorry this is a little bit problematic.

We also have skills, which is also a very interesting feature because we define skills with AI. And not only that, but we match skills to courses using AI, which means that we can personalize learning based on skills.

It's a feature that enhances internal mobility, enhances A lot of things inside the corporation that chooses talent and the best to train.

And the next couple of features that are coming very, very, very, very soon. I'm talking months, not years.

It's our AI assistant, learning coach. This one truly is going to feel like magic from the learner's perspective because it's going to help our learners

on everything they actually need, answer questions, give them feedback, organize practice activities for them, everything.

And the last one is, I think, It's not as fancy as the rest, but I think its value at scale is going to be extreme.

It's the learning paths personalization. What we're going to do is we're going to use AI to craft a learning path for each one of our learners. So this goes personalization to the max for us.

You are a learner, you have certain skills, you have selected some courses, and then we combine all the data to create a sequence of courses specifically for you.

That's going to be out by the end of the year.

Lessons Learned from Building AI Features

So what we've learned building all these features and more importantly, not building because that's trust me, it's the easy part, but maintaining them on production. We've learned that LLMs are a very, very different beast.

They are very different. A lot of the things, sorry, is that completely broken? Yeah.

Anyway, a lot of the things here are very classic issues with remote systems. A couple of them are not.

When you're building with LLMs, I practice the word a number of times. I'm not going to, I'm not sure if I'm going to get it right.

Probabilistic. They are probabilistic, which means that you cannot expect the same output when you give them the same input.

Right? So, Imagine, I don't know how to explain this to the, because I don't know if everyone here is technical, just say that you ask the same question and you get different answers every time, right?

And they can be vastly different. That's also the other part, non-deterministic.

Probabilistic is based on probability, so 7% of the time you get one answer, 30% you get a different answer, you don't know which one. Think of things like A-B tests.

Non-deterministic, it's completely bizarre. You may get with a smallest change a completely vastly different answer, right?

So that's the main challenges for us. And of course there's the last challenge.

LLMs are very, very expensive, right?

I'm not gonna share exact numbers, but at some point in early development, when we didn't really optimize, generating a course would cost us something between like five to 10 cents. That's very far from the number we are now because we have optimized the hell out of it.

But imagine 25 million people generating courses and all of these costing 5 to 10 cents. That's a huge number. It adds up.

And of course, the last point, because that's the most painful for someone with a tight architect, latency is really, really horrible, right?

That's a very common problem with any distributed system. It gets a lot worse with LLMs because just adjusting a little bit the complexity of the prompt may take you from seconds to minutes. I'm exaggerating a bit, but it's also another thing that's completely unpredictable.

You really don't know how much time it's going to get to get an answer. And when you're building a product, you cannot have your user wait forever to get the simplest answer.

Right, so what this actually means, this is turning into a course in the distributed system, sorry about that.

It means that a lot of our effort is going, went into classic old school engineering, timeouts, retries, isolation and fallbacks, right? But we must also handle all the bizarre nature of LLMs, right?

The probabilistic nature and hallucinations. One thing that people who actually work with AI every day told me is that hallucinations are a fact of LLMs. There is absolutely no way to manage to get away from them.

How we deal about that, because I think that's the most interesting part, the probabilistic nation and stuff like that, and hallucinations means that, for example, at times you really need to sacrifice creativity, I'm using the word very loosely, for precision. That means, in other terms, keep the temperature very, very low, right? That's not enough. You really need to validate the outputs.

And of course, I've already mentioned latency. The point here is that reasoning-heavy models are going to completely throw your times. Your times are really gonna suffer if you go with reasoning-heavy models.

And of course, if you do anything that's multi-step, every step introduces another chance for failure. Right, so I know some people are really not gonna like this, but let's say our golden rule here is that you really need to avoid LLNs in the critical path, right?

So what we do is very, very simple. We cast very aggressively. So even if the LLM is not there, or let's say it's not feeling well, you have something to return to the user.

We batch jobs to cut down costs because the cost is really significant. And of course we defer everything that's low priority to off-peak times because we really don't want to overload the systems.

The Role of Teams and Engineering Practices

When we started realizing the complexity and very, very early on, we made a very key structural decision. We created a team dedicated to solving, to working with everything AI. So this team, which is responsible for most of the wonderful things I'm going to show you later on Talent LMS,

owns everything that's in the AI part of the equation. Prompt handling, API services, infrastructure, logging, recovery, and of course cost. They own everything.

On the other hand, we have the product teams which have absolutely no interaction with models. What they do interact with is very stable APIs provided by the API team. So this enables them to just get an API that hides all the complexity I've talked about from them.

They have absolutely no care in the world about latency, about all these things that are very, very complex, about hallucinations. They have a stable API that's either up or down, and this enables them to focus on providing actual value to the customers. That was a key organizational decision for us and so far it seems to be working.

This organization also enabled us to treat prompts like real engineering assets. I'm not saying that prompts are engineering, that's a bit controversial, but they are also not strings that you can scatter along to your code base. And the fact that we have an AI team dedicated to safeguard them means that they have full ownership and they can do all these fancy things like centralize them, version them and test them.

right because just the smallest change in a prompt can throw away all your features uh some of the lessons we learned that are specific to prompts is that we need that you never put logic in your prompt right logic belongs in your application layer that's the products the product themes You never put things like conditional flows, right? The prompt does not decide if this, then. That's a no-no for us.

What they do, the prompts are purely content generating inputs. Okay, it's something that we fit into LLM and we get a specific output, nothing else. If you put anything else in the prompt, congratulations, you have a maintenance nightmare.

Another key thing for us is that, and that's true for every type of software engineering, you do LLMs or not, is that failure is expected. You can stop hoping that things will work all the time and design your systems around failure. That's true for every system, but it becomes a very critical point when working with LLMs.

So what does it mean to design for failure? It means some very, very well-known patterns. Retry logic, circuit breakers, and grateful degradation.

Okay, that's common patterns that has nothing to do with AI or LLMs. Human in the loop mechanisms. Sometimes you need a human to check things. That's okay.

And offline modes are required, which means that the product must work even when AI doesn't. Right? I'm not sure that's true for every product, but it's certainly true for an LMS.

The core value we give to our customers is not AI. So, the AI team asked me to highlight The point that good UX mitigates bad AI, right?

If your user experience is great, the fact that maybe you can generate a title or a course image, it's okay. So the user can upload an image, whatever. They can still get the work done without AI because AI, I think I've mentioned this like five or six times, is really not predictable. Okay, so you really need to build your UX around that.

And of course, okay, that's a bit of slogan again, but do try to measure everything from the start. Again, the main guard against unpredictability is solid metrics. Okay, I'm not gonna lie, we don't do it from day one, but we do it from, we started doing it day two or three, didn't take us long, is that we really log everything that we can log, and we measure everything, right?

What we do We have another team, a wonderful team that has been introducing event-driven architecture into our product. So we've combined the work of the AI team and the EDA team, and we measure everything based on the events our AI services emit.

This can be anything. What we call an AI job, the thing that we measure is anything the user, any interaction our users have with our AI services. So it can be something like generate an image, translate a course, whatever.

So we do take these events and we don't only measure things to find out, to learn more about our systems, but we also do try to visualize everything, provide them to our product teams to help them make decisions. At the same time, we also use them. So the same infrastructure serves three purposes.

The third purpose is rate limiting and burst limiting, which is also very, very important when it comes to AI, because as I've said, I think two or three times, AI is costly. Right, so rate limiting should be building maybe not from day one, but certainly from day two. Right, so this is what happens when you invite an architect to give a presentation.

Architecture drives all the decisions.

This is a list of what we call architectural significant requirements. They're very, very common. It's not my list.

This applies to every system out there.

One thing that I need to press is that testability is a nightmare. The unpredictable behavior of LLMs means that testing them and guarding against hallucinations, stuff like that, it's not really an easy task, okay?

So everything else is classic software engineering. Testability in the age of AI is a new thing.

It's not something that we have a lot of tooling right now, so we really need to invent things there are some tools around but they're not mature enough right now as at least compared to our classic testing automation switch right so as i said architects show two slides about architecture

Another key point is that the model choice must be deliberate. This is a trap that I've fallen into. You always go for the fanciest and the coolest model around. This may not be the best choice for the product.

Sometimes the smaller model that may not produce perfect input is the perfect choice because it will answer in seconds and not in minutes. So going back to another point, UX is more important than AI. If you can do the work with the smallest, fastest and cheapest model, do that.

Don't fall into the trap of getting the reasoning heavy LLM and doing things that may look great when you actually get the output, but it takes minutes to get anything decent out of them. So the last point is very important.

Key Points About AI Integration

We've built all our AI services so that prompts and even providers are interchangeable. Our product teams have absolutely no care in the world what's the model, what's the prompt, or what's even the provider. The AI team can change them at will and sometimes can just route between different providers because because that's convenient.

So, last slide.

It's just a recap of everything I've said.

Focus on the system and not the model. It's a system that matters.

It's not AI. AI, it's magic. It's perfect. I know, I like it.

I use GPT every day. Except yesterday, there was an outage. I wasn't working yesterday during that outage. You got me there.

Engineering discipline matters much, much, much more than AI experimentation. I'm sorry, apologies to all the AI enthusiasts, but it's good old-fashioned engineering at the end.

And design for failure, right? Make decisions based on metrics. The organizational boundaries really, really matter, right? Ownership is a key thing for every complex system, right?

And the last point is, of course, that the path for prototype production is 95. I've stressed that point like five or six times. So that's it.

Demonstration of Talent LMS Features

And I do have an instance of Talent LMS where I can showcase some of our features. It's on production, so if everything goes wrong, apologies.

Right, so this is Talent LMS. This is a portal I set up a couple of hours ago.

Let's create a course. I'm going to showcase TalentCraft. I'm going to start from scratch because it's the most impressive.

Anyone cares to give me a prompt? What do you want the course to be about? No one?

Imagine if it does better than . So let's see what it's going to do.

OK, that's not the worst image ever. Still generating. And I think that's it.

Why not? That's published now.

Oh, I never gave it a title. Okay. I never gave it a title, so I opened the wrong one.

Let's just open one that I've already created. Yeah, it has a unit.

So let's do Spanish. Academic. Done.

That was a course in English originally. It's not the course I generated.

It was a course on the P versus NP problem for the nerds among us. And it's in seconds. I have it available in Spanish. It's not a perfect translation, but imagine yourself in the position of an instructor that has the original English content and now it has a very good Spanish translation in seconds.

Closing Remarks

And that's it for me.