Matching and ranking with natural language understanding for process automation

Introduction

Today I'm going to talk about practical use cases with large language models and how to use them in your enterprise applications. Because as one of the sponsors said about paying the bills, actually using large language models in itself is not an ends to the mean. We want to actually make money with that and it has to be somehow put to use in the application landscapes in different companies.

Practical Use of Large Language Models

So if we're looking at the practical usage of large language models, we have a pretty common pattern right now. Just look at all the AI projects that are starting. Everybody starts pretty much with the same idea.

Understanding Inference Endpoints

OK, there are a few providers, like the large like players in the field, OpenAI, Google, Antropic, et cetera. They basically give us the ability to use ChatGPT on the browser, but they also have what we call inference endpoints. So generally, an API that you can reach from your application, send in prompts, and get responses.

And that is pretty straightforward and has one huge benefit. It does not require any significant, let's say, implementation effort other than just implementing that endpoint, that reference endpoint. It's usually just bog-standard HTTP. So that's really great.

Challenges with Cost and Latency

But there are some drawbacks, right? The drawbacks is it costs a lot of money. It can cost a lot of money.

Why? Because the business model of these large players is to charge you by the amounts of tokens that you're sending. So you basically have a meter just like, you know, your utility, and the more you use, the more you pay. So that's, there's a cost risk.

The other risk is, or the other problem is latency. Those large language models, very large language models, have been trained on trillions of parameters. As of now, the largest ones are Gemini and GPT, and they've been trained on 1.5 trillion parameters, so really enormous amounts of data, and they're really big, so they cost a lot to operate. which is the reason why these companies have gotten a substantial amount of funding and are basically hogging the market for and they're just slurping out any kind of GPU they can find just to put that in their data centers.

And that comes for, let's say, us application developers at a price because these models are very big and they take a lot of time to compute and the answers are not instantaneous, so there is latency, right? And that needs to be factored in in your application design.

Security and Privacy Concerns

And lastly, I mean, everybody heard about the Samsung case, where basically proprietary source code popped up in GPT sessions. That's one concern. PII leakage is another one. Everything that you send to those public inference endpoints can be used as training data sets. So there is some security, governance, and privacy concerns associated with running public inference endpoints.

Exploring Alternative Approaches

So what if there was another way?

Usage Patterns for Large Language Models

And now we're just basically looking at the usage patterns for large language models. Large language models are used for specific things that we could not do properly before.

So if you're looking at, if you think you're an application developer and you need to do some tasks, well, you used to be able to have to do that on a rule-based pattern, right? You had like some rules, you'd apply those rules, do some parsing, et cetera. That was cumbersome and difficult.

Natural Language Understanding and Its Applications

And large language models bring in a new aspect that we generally call natural language understanding, so the ability to actually process text and have a basic understanding of what it does. So that's great, because things that always required humans, like classification of data, of text, so what is this? Oh, OK. That's that. That's here. And you need some clerical work there.

Large language models can do that now. Or summarization. Everybody's using that nowadays.

I've got this long text or the meeting minutes. I'm just going to put it in a large language model and do summarization on that. Great.

Or entity extraction. And entity extraction is a funny one because if you think about what I just said about PII and the leakage problem, well, guess what? You can solve the PII leakage problem and the risk associated with privacy very well by using large language models to recognize what is private data and whatnot. So you're basically using the same thing to solve the problem that you had initially had.

That's one major advantage of large language models, that reasoning and understanding is baked into the model and can basically say, oh, this is an address, this is a name. I'm going to extract that from a long flow of text and be able to mask it if necessary.

Programmatic Approach to Large Language Models

That all leads to essentially one thing that is quite important in application design, and it's the ability to think programmatically about large language models. So essentially, instead of having an inference endpoint and saying, OK, dear GPT or Gemini, this is my long prompt. Please do that, that, and that for me. You basically can, with a programmatic approach, you can basically dissect your problem in several units and several steps and specifically solve the problem one step at a time.

Langchain: A Framework for Composable Architectures

In comes a framework that's emerged, it's been around for a bit longer, but it's emerged over the last, year or so as the main player for composable architectures in large language model design, that's Langchain. 1Langchain gives us the ability to basically create a chain of execution in a generative AI application. So basically says, if I'm trying to solve a problem, I'm going to do that in several steps, right? Step one, step two, step three. And I'm going to compose that in a chain of execution that can do different things.

And these different things can be, um, taking in user input and applying that to a template prompt, right? So essentially, you know, let's say there's a user that has wants to, um, do, uh, uh, a lookup or a query on something, instead of having the user having to compose the query, you can use a prompt template and have a large language model do that for them, right? Or basic calculation as well to say, okay, I want to calculate, put these two or three inputs and that is then processed by the large language model and executed.

And LanChain gives us also the ability to just combine the inputs from large language models with code, with programming code. LanChain is a Python framework, so you can use it just without language and plug in different execution and functions in that chain of execution. And it can also combine inference endpoints. So you can also still do the stuff with ChatGPT when necessary, but embedded within a chain of execution within an application. So this greatly flexibilizes the way you operate and think about applications with large language models.

Facilitating Composition with Langchain

And the most important bit is it basically facilitates composition. You can say, there are different models out there. And you can basically say I'm going to use one specific model for a specialized model for a specific task, let's say keyword extraction, and use another one for composition of queries against databases, and another one yet for general purpose answers or summarization. So that is a great asset.

Introducing the Langchain Agent

But one of the coolest things about Langchain is the agent. An agent is essentially one component within Langchain and it runs as a small embedded large language model. And you can basically tell that large language model, you've got tools, I'm gonna give you tools, right? And you basically execute the prompt that we give you and use the tools to solve the problem.

So let's say for a moment that you want this bit of code to answer the following query. I've got three apples and two bananas. How many fruit do I have? Most general purpose models struggle with this.

They give you hallucinated answers. In the agent model, The model can understand that there are two integers of a specific type. I think I said three apples and two bananas, right? And those two are fruit.

And so the answer, I need to strip those numbers and put them in a bit of code that is one of my tools, a calculator. And that calculator is just a bit of Python code that just takes two numbers and adds them up, for instance, or multiplies or whatever. And the tool, the large language model understands that it can execute that tool and will return five. You've got five fruit.

So that's a cool thing. The other bit that I can do in agents is execute queries against databases, which is always a bit of a problem because, as some of you may know, a large language model can generally normally just answer what it was trained with. But with agents, it has the ability to execute a query against an enterprise database with some other information that can be mixed in in real time. It can also do look up on the web for instance or call other endpoints.

Anything that can be programmatically expressed as a tool can be combined in an agent and the agent has the ability to then decide to a certain extent because contrary to whatever what the hype says, A large language model does not have consciousness, nor is it capable of reasoning. It's just very good at expressing probabilities of answers. But the agent gives it more features and more abilities to execute stuff.

Retrieval Augmented Generation (RAG)

And then, as another component, another aspect is what we call RAG, or retrieval augmented generation. So that bit is an extremely potent architectural pattern to reduce hallucinations. We talked about that a bit earlier.

A large language model is generally only trained on its own data set. So that's the reason why you're often where you basically do some use chat GPT-3. For instance, you'll always get that answer. I was only trained on data sets until the November 2022 or something. I cannot answer anything that's beyond that point.

That's specifically... that points to the intelligence or the information that is baked into the large language model is finite and basically cannot be expanded unless you do something about it. And RAG is one way to instruct a model to say, well, you know, you can look beyond your own training data set. You can look into a store which contains additional information, which is called a vector store that has a lot to do on how, let's say, large language models chunk and process information and do lookups, right?

But essentially the RAG architecture gives you the ability to say, okay, I can continuously ingest new data into the store and have the large language model access that information and do lookups. So this is a very, very important pattern because you can also instruct the large language model to only give answers on the information that's contained in the vector store, in that RAG database if you want, which also gives us the ability to reduce hallucinations.

Reducing Hallucinations in Model Responses

Talking of the hallucinations, why do they occur? because a large language model does not have the ability to reason. It cannot check its answers. It does not really understand its answers.

It just generates something that is most likely to be true, which is the reason why sometimes we use a large language model for coding. It just generates completely nonsensical answers just because they sound true. which is essentially, if you want, a large language model is basically like a management consultant. It just generates the stuff that you want to hear that sounds about right. And I can say that because I am a consultant.

Leveraging Langchain for Application Structure

But if you understand that using that technique combined with things like Langchain gives us incredible power over the way we structure applications and the way you basically surgically use great features of large language models within the context of a traditional deterministic application.

Practical Use Case: Matching with Large Language Models

Yes, please. Sorry? Yeah, yeah, yeah. Yes. Oh yeah, correct. Good point. And that's kind of funny, thank you very much, because I've used this slide before and no one discovered this. OK, well, kudos to you. Thank you.

The Challenge of Matching Candidates to Roles

All right, so now the practical use case that we're talking about today is matching. So we just basically, just for demonstration purposes, we basically said, what's the usual problem? Well, you have lots of applications from different candidates, and you have a role, and you want to find the best role. rather than find the look for the best candidate in the first 50, because you may have, you know, 200, 300, 400 applicants, right?

And all of them want to have that, that job, but you know, as a recruiter, you can only look at so many CVS. And so we want to basically create something using large language models to, to do, uh, to give us a level playing field in terms of, um, Let's say that a recruiting use case.

Normalization and Anonymization of CVs

So how does this work? Well, we have a bunch of CVs, right? And these CVs come in all different forms and formats, right? Word documents, PDFs. They have some of our fancy with pictures. Others are just text. And, you know, some people put their skills forward. The other work experience, education, all that is like completely heterogeneous.

And it's really difficult. So one thing that large language models are really good at, and that's normalization, right? So just read that stuff. and basically structures and reparses and reformats the CV to create a single format, make it much easier to process.

At the same time, we want to match those CVs to roles. So generally, what we do here as an approach is to say, OK, we're going to take the roles that we're looking for, one to fill, and the CVs of candidates, and we're just going to put them in a vector store and use large language models to find matches.

Classification and Skills Extraction

So what the application does using Langchain as demonstrated, it basically does, obviously does normalization as discussed, and then it anonymizes the CV. So it removes all names, references, private addresses, et cetera, because we want to have, let's say, an unbiased view on the candidate. Second step is classification. So it finds out, is this an accountant, a software developer, or any kind of other role. OK, I'm just business manager, sales, et cetera.

It extracts the skills. It filters. So one thing that we can do in the application is to ask questions. Does this applicant is a resident in the UK? Do they have a working permit? What kind of time zone do they live in? And so on and so on. So you can ask questions, and the large language model will try to answer it based on the CV.

So it's really that. In the piece of code, you have a prompt. It says, this is my question. Does this CV answer the, what is the answer to that question if you look at that CV? And the large language model will look at the CV and say either yes, no, or I don't know. That's also possible.

And lastly, it will give us a summarization. And then we can basically match that to the role. And in order to do that in that langchain process, we basically use different LLMs, bespoke LLMs. There's, we use an LLM called Bison, which is very good at text wrangling and normalization. We use another one called Keybert, right? And as the name slightly indicates, it's very good at keyword extraction. And for everything else, we use a smaller, larger angle module, Metas-Olama. Those models we basically choose based on the merits of what they do specifically. And that's another aspect of this, let's say, composable architecture is that you can use smaller models, models that execute faster rather than using a large 1.5 trillion parameter model.

Demo: Langchain Application in Action

So cognizant of the time, I am going to show a bit what that looks like in an application that we've just built as a demo. So there we go. Okay, so this is basically one step, right, of that application.

Uploading and Processing CVs

We're just going to upload a few CVs, right? And yeah, so these are bogus CVs. I'm gonna use a data engineer and this, and I'm just gonna upload them, right? Let's hope that the Wi-Fi does, yeah, I'm tethering my phones, but let's hope that it's not going to be too slow, right?

Yeah, like that. Let's take a look. Add another one here. Just adding a few CVs.

Yeah, and now it's basically doing as you can see those little bubbles. Those are different processes in the chain. So there's enrichment, there's extracting skill, education, and so on.

Summarizing and Embedding in RAG

And now it gives us a summary, right? So it basically processed two documents, the security officer and social media manager. Just for the sake of purpose, we just basically used a CV generating website and just put some bogus things in here. Those are not real people.

And then it basically gives us a summary of that person, Keywords that we extracted from the CV, what they do, like safety and monitoring, enforcement, et cetera, and other aspects of their, let's say, skilled profile. Same thing for the social media manager.

We basically have keywords and extraction and so on. And now we can basically say we're going to embed all these documents in our RAG. And that gives us the ability to, okay, those are just technical settings. And now we can go and upload a job offer.

Matching Service and Results

So we have one here. Let's take the data engineer. And now it's generating another summary, creating skills, et cetera, all of these processes. And it says, OK, those are the detected skills that we're looking for, detected keywords, some content-specific things.

We can do a QA filter as well. Let's skip that one and then go to the matching service. And we have the matching service, basically uses the RAG, compares the text, and say, OK, great. This guy is a good match.

It has the different skills and keywords. And we can take a look at the original CV, et cetera. This basically gives us, as a demo, it gives us an indication of how we can compose a specific architecture, extract skills, create an unbiased view on candidates that is not based on human looking at a CV and also having, let's say, an idea of whether I'm going to look at the CV and I like this candidate or not depending on the layout of the CV, but just really looking at just the core facts that we extract using a large language model and matching it to the role.

Benefits of Automation with Large Language Models

So this basically also gives us a different programmatic approach to this problem. This is just an example that demonstrates how you can use large language model, especially for automation of document processing, clerical work, and all these other, let's say, more difficult tasks, and that can traditionally not be solved using, let's say, rule-based approaches with deterministic systems. That's a great advantage of large language models because it basically enables and assists us to do better work with that kind of reasoning assistant.

Conclusion and Q&A

So I hope you like my demonstration and I'm happy to take questions.

Finished reading?