Thank you for listening to my talk about large language models and what they are specifically. And we'll also loop back a bit into the earlier conversation on education democratization aspects. So I hope this is going to be interesting for you. So without further ado, because I know that everyone's waiting for pizza, it's kind of obvious, it's late in the evening, let's get started.
So essentially, Education, we're starting with a bit of history.
LLMs appeared on the scene last year or maybe a year before with ChatGPT, but they've been a long time coming. So essentially the concept of a large language model, a conversational chatbot,
It goes back to a program called ELIZA in the 1960s. That was a little, let's say, computer psychotherapist. And it was just based on clever rules, really.
And that idea evolved in the concept of neural networks, basically designing computer systems based on how the human brain works. And then at some point people said, okay, if we basically structure speech as mathematical vectors, well, we can do some clever things about that. 1And notably introducing a concept called tokenization, which we'll come to a bit later as well.
And then, eventually, they basically introduced another type of convolutional neural network called LSTM, which stands for long short-term memory, which is essentially a mechanism and a clever pattern to do summarization, which is also another feature that we're looking at.
And lastly, the folks at Google Brain, which is like the... the super nerdy AI outfit at Google came up with a concept called Transformers, which is at the core of what we call generative AI nowadays.
Essentially, transformers are really, really interesting because they focus on one thing, on attention. And that is also a very human occupation because when we have a conversation, the more we basically capture from that conversation when we speak to people is based on the attention that we give to it. So again, we're basically mapping the idea of how humans think and operate and communicate back onto computer systems.
And transformers are really, really clever because they basically encode data as vectors. going back to the tokenization stuff, and decode that in a self-learning neural network. And this attention approach is really about focusing on the relevant bits of that input, of the incoming tokens, and saying, okay, this is really what I want to keep, using LSTM, the long short-term memory approach as well, to capture specific interests, interesting tokens that are relevant for, lastly, predicting what the likely outcome is.
So essentially, if everybody is saying, oh, this is AI, it's artificial intelligence, well, it's not really that. That's why there is this concept of general AI, so really with conscious thinking.
What we've got now with our language models specifically is a very clever talkback. mechanism, essentially like a management consultant. So very clever at just speaking back the words that you want to hear.
And the way they do that is by training on itself. So they take some text, they predict the text, and then they compare the predicted text with the actual output. And that's how they learn.
And one other aspect that makes LLM relevant nowadays is that we also have massive amounts of computing power to actually train these models, and they need a lot of that. A lot, right?
So most large language models are trained on enormous data sets. So one of them is Common Crawl, which is essentially a collection of curated web pages, 50 billion pages of them. We've got Wikipedia, kind of obvious, mostly smart and mostly correct stuff in there.
And obviously also things like for the coding stuff, because we also use large language models to generate code. We use GitHub repositories and Jupyter Notebooks as a training basis for that. So generally, we feed all those large language models with a lot of humanly created, massive amounts of humanly created data.
And we have obviously also a few common implementations. Google Translate is one of the first ones that are using a type of transformer model to translate German to French, Swahili to Mandarin, and whatever. And the interesting bit about that is that in the model itself, there is no direct translation from one language to the other, but there is a sort of gibberish model in between, a meta language, which the model uses to transport to translate from languages are syntactically completely different.
Let's say in English we have different notions of past, present, and future tense. Mandarin does not have that. There is only present tense. That's why people usually say, I'm going to the train station. Mandarin is, I go station. So that's because that language is structured completely differently.
OpenAI, obviously, we talked about that a lot. The biggest and most famous large language model, it's ChatGPT. And we have not only ChatGPT, we have also a lot of open source large language models. And those are really relevant for enterprise use cases that we're going to come to together because we want to use them privately, specifically.
So basic ingredients, I'm also, I know that I'm going to speed up a bit because we're on the clock and I want to give you the ability, also the chance to ask questions.
Basic ingredients is a pre-trained large data set. Access to large amounts of data is really key. That's make or break, actually.
Validate and filter the data, super important. You need to deduplicate the contents to reduce the problem of false predictions and hallucinations.
And parameter sizes are also very, very important because the amount of inputs that you give a model also dictate how good it is at responding things. And obviously, if you want to have specialized models, you need to take that into account as well. Or general models like ChatGPT uses very, very generic data sets to do so.
Lastly, there's another technique that is actually quite relevant to large language model, and that's what we call fine tuning. So in fine tuning, what we do is we take a general foundational large language model, and we fine tune it with a set of bespoke parameters. So essentially, we say, I've got a generally trained Wikipedia Common Crawl model, and I want to add specific content of, let's say, a university library, for instance. We can just take and we can fine-tune a model to specifically emphasize the data set that we fine-tuned into the model, we bake into the model.
It's a useful technique to improve outputs and also to specialize specific models and also to use smaller models because, again, size matters. ChatGPT, for instance, is a network of very large models using over 200 billion parameters. It is extremely expensive to train and also extremely expensive to run. For most use cases, we don't need to use the big a big bazooka to shoot a pigeon, a pellet gun will do. So you take a small model which is trained on a smaller data set and is fine tuned to do the job and it can run on less computing power and also takes less money and less resources to train.
And specifically, when we're looking at large language models, we have two modes.
The first mode, everybody's familiar with that. If you open OpenAI, chat GPT, you ask a question, you get an answer, you ask another question, et cetera. That is chat mode. So essentially, you're basically having a conversation with a model.
It keeps your questions and everything in a session context and basically fine tunes their answer on top of that.
While we're looking at enterprise use cases, the other one is actually more interesting, the instruction mode, where you basically instruct and give machine-readable instructions to the model to behave in a specific case. And how that looks like is basically using prompt engineering with an instruction that has a question and gives a parsable response. So essentially, you can interact with a large language model from code specifically.
And that is quite powerful because large language models are clever. They can do very good things such as classification, summarization, et cetera. And you want to leverage that in a computer program without having human interaction.
Because most of us here probably have experience with large language models in the context of machine to human, human to machine. And what we're looking at in enterprise use cases is specifically, sorry about that, I know HR is going to kill me for this, drop the human out of the equation. So that's the instruction mode in large language models.
So, and specifically the task that we want to do with large language models is classification.
So given the set of basically of classification classes, we tell the large language model to say, what is this? So let's say you have a policy document from an insurance, there's text in it about cars, and you ask the large language model, what kind of insurance is this? And the large language model will say, well, there's a lot about vehicles and damages and fenders and whatever and wheels and so on, litigation.
This is car insurance. and can do that, right? You don't need to ask a human to read the document and say, what is this, car insurance or health insurance or whatever. The large language model can do that.
And specifically, if you think about other types of classification, such as customer support, we can use large language model to do that too. Summarization. Most of you have used Teams, Zooms, and whatnot. You have all those clever AI assistants that create a transcript from the...
the meat and then summarizes what happened and what was discussed in that conversation. Very useful, also large language model. And lastly, entity extraction. When you ask a large language model to recognize a pattern in a text, for instance, or in a picture, it says, look, what is this?
Can you extract an address from a large document, for instance? So we'll basically parse the text, and we'll say, this looks like an address. It has a zip code, a house number, et cetera, address. I'm going to extract it.
We used to do that by hand as software developers. It was a pain in the arse, sorry, right? Because you have to create parsers and combinators to find the address of the text and find different rules and so on. Large language models can do that quite naturally because of the limited computer understanding that they have compared to human beings, obviously.
So when we look at the impact of large language models, so what we can actually do with them,
There's one thing that is quite significant, and that is democratization of machine learning. So until recently, when we were doing machine learning, it was about, OK, we're going to collect a lot of historical data points from this enterprise. We're going to use a machine learning model, like logical regression, k nearest neighbor, whatnot, and train it on that data set to make predictions on similar data.
The great thing about LLMs is that that traditional machine learning model, which continues to exist and has its place, it becomes easier to use predictive features with large language models because a large language model is specifically a prediction machine. It is really good at that. And it makes it easier.
So you can actually just say, I'm going to take a foundation model like Lego, and I'm going to do something and ask it to do a task. And it can actually do a prediction that is reasonable and doesn't cost the bank. It doesn't break the bank to do predictive analytics.
So it is really, really a game changer there. And specifically, it also is less hard to integrate machine learning use cases because they're largely pluggable and we can interact with large language models using the instruction set and prompt engineering.
So obviously, this comes at a price.
We have big issues.
Everybody knows hallucination, confabulation, where it basically just starts inventing stuff. It looks right, but it's wrong, especially when you try generating code. And so that is a problem because it looks right and it is a prediction, but it isn't right. So we need to have a way to control that.
You say, oh, generate me. Generate a little web server using this library, whatnot, and just generate sometimes some absolute nonsense. And that's where the similarity with management consultant comes in. So I'm a consultant, I can say that.
Another problem, it is really difficult. And sometimes you have legal issues with it, copyright. Everybody maybe has heard of the Samsung use case where some of the Samsung intellectual property leaked into ChatGPT, real problem. And privacy, same thing.
1So there is one way to do that, and it's called RAG. RAG stands for retrieval augmented generation. And it's a very clever yet simple process.
You say large language models are really good at doing one thing, that's predicting stuff. But instead of saying we're going to bake in all the information that we want it to use to make a prediction, we're going to externalize that stuff into an external database. And so in that case, we say, look, large language model, you are going to make your predictions on a vector database, because obviously we're always talking about vectors and tokens and so on. And that's where you're going to retrieve your data.
So there are two benefits using this approach. First of all, you control the output because the large language model is instructed only to use the data that's in that database and not try to invent stuff it has baked in. So it doesn't reduce or it doesn't eradicate the problem of confabulation, but it greatly reduces it.
And the second bit is you don't have to retrain the model just because you have new data. You can actually just update the vector database that contains the data at any point in time, continuously, and each generation cycle will use the updated data without having to touch the model. So this makes it quite, it's a very powerful yet super simple way of solving that problem.
And honestly, every single use case that we do in enterprise uses the RAG pattern, with some exceptions if you do some expert systems in code generation. But generally, RAG is the way to go. So the advantage is we separate the relevant input, we reduce hallucinations, and operating costs are also much smaller because you have a foundation model and your vector database, no training, you limit the amount of those very expensive GPUs that everybody wants, just enough to run your model and that's it.
So essentially when we're looking at use cases nowadays, especially at stuff that we are working with at Hivemind,
recruiting. And nowadays, everybody basically knows that. You just put a role out there, and you have 4,000 applications on LinkedIn. And it's like insane.
How the hell are you going to review this? Are you going to take a dice? Every 20th I'm going to look at doesn't work. And it's also not fair, because you want to make sure that you really get the really good candidates.
So the way to do that is to use automation. And one way to automate this is to use a model that will then do that work for you. It will basically summarize the CVs that you're getting, will extract the necessary skills that you're looking at, and rank the CVs according to your needs.
And specifically then, you basically then take your role that you're looking at, the position that you're looking for, to fill, you're looking to fill, and you let the large language model do the matching. So what it will do is basically compare the inputs, compare the summarized, normalized CVs with the role, and give you an explanation why the model thinks that this candidate is appropriate. And then obviously the human being has to review everything, because obviously it can be wrong.
But this is one way to integrate large language models in a very manual, otherwise very manual process, and make it also fairer. Because one of the major problems in recruiting is because of the enormous amount of CVs that we're looking at, you might just also overlook someone who's absolutely qualified for the job, but you don't see them because they drowned in 600 other CVs that just popped into that inbox. And at some point you say, give up, I want to stop, I want to go home now. And that's one way to use large language models to make things fairer and the lives of people easier.
Another use case is document processing. So we're all in the business of digital transformation, especially in this country. And boy, does this country love paper. And so you just have something with Finanzamt or the city of Berlin, and you have to fill in another piece of paper and sign somewhere, et cetera.
One way to use large language models is for document processing. You use that. They're very good at OCR and pattern recognition, so you can basically just, even if you have digitization of old paper or new paper, you can basically bridge the gap between the paper world and the digital world, and also continuously update databases and expedite processing digital assets.
We also find that problem in insurance. A lot of insurances, especially the legacy ones, have millions of old contracts that go back like 40, 50 years, and these documents are on paper. and need to be verified, especially life insurance policies. It goes back sometimes like eight years, and you have to read that paper.
What were the conditions? How are we going to pay out? What do we need to observe? Large language models can do the extraction bit, summarization bit, and specifically also convert the old policy document into the new structure of the newest system that the insurance company is using.
Customer support, kind of obvious. We talked about that earlier. Classification of requests, right? People just come in and have an insurance claim or any other complaint.
It can be classified and routed to the right person, and it can largely expedite the process and improve customer satisfaction without anyone losing their jobs, right?
And lastly, image processing, right? This is one of my, I love this example, right? When you say CAPTCHA, right? And you ask chat GPT, what's this?
And it says it's human CAPTCHA, right? And this is what it says, right? Specifically, The siblings of large language models, because large language models are usually text, large vision models are capable of recognizing patterns.
Images work pretty much the same way. We use LVMs to recognize text. For instance, when we're looking at helping and expediting meter readings.
Everybody does not have a digital meter, but one of the old ones with a little clock and turning thing. Or you snap a picture and an LVM can do the accurate reading of the number on that meter with everything in it and also expedite the process of creating a utility bill, which also reduces cost for everyone.
And lastly, legal, that's a big one, right? Because if there's another group of people who love their paper, other than the German government, it's lawyers, right? It's really surprising how much manual work is in that profession. And not necessarily the lawyers, but mostly the poor folks that work as paralegals.
So all the people that actually have to do the work of researching cases, comparing stuff, and sifting through prior precedents or commentaries and so on. Large language models can help with that specifically because you can train a large language model using REG or fine tuning to basically come up with precedents similar cases that match the classification to help lawyers write accurate legal briefs without losing time and without incurring enormous bills for everyone who has to pay these people.
The other aspect that is also quite relevant in the legal profession is the so-called data rooms. When two companies merge, the first thing that they do is say, we want to merge. We are going to sign an LOI, a letter of intent. And then they have their teams exchange documents.
And that's horrible. There is massive amounts of data that's exchanged between two companies, numbers, contracts, and whatever. And that has to be sifted through manually. Nowadays, large language models, we can have a conversational structure where you basically say, I'm using this data room.
Can you give me the supplier contract for this company in fleet management or whatever? And the large language model can point you to the relevant section. It expedites a process of creating and sifting through data rooms massively. And lastly, it's kind of obvious, is the legal text co-pilot, which basically gives you the ability to write a brief with text generation.
That's not a really big use case because you can basically do that relatively easily also with chat GPT, but what the game changer is is using that based on trained data for the legal practice, so using their own documents rather than using generic documents that are sourced elsewhere. So that was a run through some of the use cases and how large language models work. I hope you enjoyed this and I'm happy to take your questions.