Lessons from Building Vertical Search with LLMs - Alexander Vasserman

So today I'm going to talk about building AI vertical search and they

lesson learned while we're building variety of those systems. Just a little

bit about me. My name is Alex Wasserman. This is the email where you can reach me

at. I have over 20 years of experience in search and ML engineer at places like

Google and Etsy and currently I'm at Serenity GPT where we are building

custom search engines and GPT like solutions for businesses of all sizes.

So that's my shameless plug. Now that I'm done with that let me talk a little bit

about what vertical search is and how it's different from web search. I want to

talk about web search because I think that kind of connects everybody's

experience to what I'm talking about. So on the web obviously all the you're

searching over all public data. There is huge amounts of information but also huge

audience, lots of signals and obviously there are well-developed solutions. So

you go and you Google things and these days you can also ask chat GPT or any

other AI engine and get the results that you want. On the other hand if you have

some proprietary data or very specific domain then it's a little bit more

complicated. So let's say if you have your own lessons plans that you don't want

to expose to public and you have a large repository of tutorials and or

assignments or anything like that. So there are a lot of tools and

development platforms and solutions that kind of been appearing recently but

most of them aren't solving things perfectly. I'm not going to recommend any

particular one I'm just going to talk about what are some of the features and

some of the things that we've run into while developing some of these custom

search engines and hopefully it'll help you pick a solution when you want to

develop something for yourself or figure out who to work with when you need

something like this. So let me start by looking at the web search and if you

didn't know today is apparently the day of the dead which is celebrated in

Mexico and Google very helpfully told me about this and this page actually

presented quite a few capabilities that I wanted to talk about so I used it as a

reference. Some of the capabilities that I want to talk about is retrieval

augmented generation which is something that we focus on and this is the part

at the top of the page. This is a new feature for Google it's somewhat

experimental at this point I don't think every it's available everywhere but this

is one of the things that I'm going to focus on. Then of course the basic

search results. So this part is the retrieval augmented generation or

RIG search results. On Google they ended up being at the very bottom of the

screen. There are nonetheless very important part of the any page and that's

what we're going to be focusing on a lot but and the search results are based

on keyword search which is the traditional search but also on machine

learning and AI models. Then there are also image results here so there are

multiple corpora multiple types of results and that's something that we're

going to talk about as well and there are facets which are switching different

types of results as well as curated content. So content that was assigned

essentially to this query by someone. Well so now let's switch to actually

vertical search. Like I said for proprietary data it we've built some

different environments but it has very similar parts to it so there is a

question answering and smart response. This is the retrieval augmented generation.

There are search results here. There are different filters and different facets.

Okay so let's talk a little bit more about retrieval augmented generation and

why it is important. What you want when you're trying to look for information is

the answer to the question you're looking for not just search results

and of course that's something that we expect now this chat GPT being available

and again if you are looking for some data that's proprietary data then you

can't get it from just open AI. You can't also really tune a large language

model to your domain that's very expensive and it doesn't really work that

well so a retrieval augmented generation is really the way to go. So how does it

work? Well basically we're doing search for and we put the search results as a

context to the large language model or something like open AI chat GPT and we

do a bit of prompt engineering. Prompt engineering basically makes sure that we

only use the your data or data from that specific domain to generate the response

that there are no hallucinations so basically the information that is in the

answer is actually coming from the data. It provides the links to the information

coming from which is important and also able to provide say something like I

don't know if it doesn't have enough information. So all of these things are

part of the generation part but of course search and retrieval are some of the

more important parts and difficult parts and so that's something that I want to

talk about a little bit more. Like I said traditional search keyword search is

something that been used for a long time. It's mostly finding the exact words

that you've typed in in the documents. It's fast and cheap and domain independent.

Nowadays most places are switching to semantic search. It's meaning-based not

just looking for the words but that can go astray actually for very specialized

domains because it is trained on general English on public data and so what it

thinks are similar things and what it thinks means that you what you mean can

be different in a specific domain. So one of the solution to this problem is

using a hybrid search so you submit queries to both types of systems and

then you just merge the results together so that's something that we found works

much better than either one of those two. So let me talk a little bit more about

semantic search. The core of semantic search is basically a language model that's a

transformer language model. It's similar to architecture to models used by

OpenAI and chat GPT and other AI systems but they're much smaller and they convert

text to basically a vector that we call embedding and then the search works by

just finding vectors that are close by to the vector that's associated to the

query. Like I said it's much smaller than the large language models that OpenAI is

using so it's not as expensive to run and that's important for especially for

search. So yes it's affordable it's scalable if you use some of the

technologies some of the things that I showed on the first slide a lot of

those are vector databases that allow you to find these vectors that are

associated with the documents very quickly. Like I said it can be not so great

for specialized domains but you can fine-tune it. Fine-tuning basically is

taking the model and providing it with more examples from your specialized

domain and making it work for your domain and for these small models unlike for

alums that can be inexpensive and we found a lot of success doing this. One of

the things that large language models help there is actually generating queries

for the documents and we can use that as additional data to fine-tune these

models. Even though the fine-tuning really helps but still we found that using

hybrid approach so using both keyword search and the semantic search is the

best combination. In a lot of cases you have multiple types of results and for

example you might have documentation or you might have tutorials you might have

interactive communications etc and it's actually difficult to make it work well

for all kinds of different types of information. If you have large amounts

of data, large amounts of interaction then it's possible to tune these models

so that the different types of information doesn't get confused but in

most practical cases that we've dealt with where there is not as many people

that use the systems or the amount of data is not as huge. The solution would be

again retrieving different types of documents separately and then merging

the results post-retail. Another alternative and kind of similar solution is

using facets. So as you've seen for the for Google there is different facets

that are on the top of the page or additional filters in our system so

basically it's a way to restrict search to particular document type or using

other filters. It's particularly important if you want to restrict accesses to

certain types of documents for certain people or provide different view of the

content for different users. It's relatively easy to build for the keyword

search and for small-size semantic search indices but would require fairly

advanced solutions for larger indices. Finally the curated content is the

last type of content that I want to talk about. It's also particularly important

for smaller search engines because no matter what technology you're using

there are going to be gaps and there are going to be difficult cases and it's

always best to use your users and involve people and get them to produce the

content whether it's specific different types of documents just for including

into your search results or just recommending a particular search result

for specific query. Of course you usually provide the query for that content so

that we know where it would show up but matching only to specific query is not

very scalable so one way to provide the curated content in the search context is

to look for similar queries and whenever search for a similar query occurs also

exposing that custom curated content and the embeddings that we talked about

for the semantic search can be also used for the similarity of the queries. So

that's really the different things that I wanted to present some of the

techniques that we found useful in building custom search engines and

search engines that are outside of basically the public domain. So thank you

and now if you have any questions I'll be happy to answer.