So today I'm going to talk about building AI vertical search and they
lesson learned while we're building variety of those systems. Just a little
bit about me. My name is Alex Wasserman. This is the email where you can reach me
at. I have over 20 years of experience in search and ML engineer at places like
Google and Etsy and currently I'm at Serenity GPT where we are building
custom search engines and GPT like solutions for businesses of all sizes.
So that's my shameless plug. Now that I'm done with that let me talk a little bit
about what vertical search is and how it's different from web search. I want to
talk about web search because I think that kind of connects everybody's
experience to what I'm talking about. So on the web obviously all the you're
searching over all public data. There is huge amounts of information but also huge
audience, lots of signals and obviously there are well-developed solutions. So
you go and you Google things and these days you can also ask chat GPT or any
other AI engine and get the results that you want. On the other hand if you have
some proprietary data or very specific domain then it's a little bit more
complicated. So let's say if you have your own lessons plans that you don't want
to expose to public and you have a large repository of tutorials and or
assignments or anything like that. So there are a lot of tools and
development platforms and solutions that kind of been appearing recently but
most of them aren't solving things perfectly. I'm not going to recommend any
particular one I'm just going to talk about what are some of the features and
some of the things that we've run into while developing some of these custom
search engines and hopefully it'll help you pick a solution when you want to
develop something for yourself or figure out who to work with when you need
something like this. So let me start by looking at the web search and if you
didn't know today is apparently the day of the dead which is celebrated in
Mexico and Google very helpfully told me about this and this page actually
presented quite a few capabilities that I wanted to talk about so I used it as a
reference. Some of the capabilities that I want to talk about is retrieval
augmented generation which is something that we focus on and this is the part
at the top of the page. This is a new feature for Google it's somewhat
experimental at this point I don't think every it's available everywhere but this
is one of the things that I'm going to focus on. Then of course the basic
search results. So this part is the retrieval augmented generation or
RIG search results. On Google they ended up being at the very bottom of the
screen. There are nonetheless very important part of the any page and that's
what we're going to be focusing on a lot but and the search results are based
on keyword search which is the traditional search but also on machine
learning and AI models. Then there are also image results here so there are
multiple corpora multiple types of results and that's something that we're
going to talk about as well and there are facets which are switching different
types of results as well as curated content. So content that was assigned
essentially to this query by someone. Well so now let's switch to actually
vertical search. Like I said for proprietary data it we've built some
different environments but it has very similar parts to it so there is a
question answering and smart response. This is the retrieval augmented generation.
There are search results here. There are different filters and different facets.
Okay so let's talk a little bit more about retrieval augmented generation and
why it is important. What you want when you're trying to look for information is
the answer to the question you're looking for not just search results
and of course that's something that we expect now this chat GPT being available
and again if you are looking for some data that's proprietary data then you
can't get it from just open AI. You can't also really tune a large language
model to your domain that's very expensive and it doesn't really work that
well so a retrieval augmented generation is really the way to go. So how does it
work? Well basically we're doing search for and we put the search results as a
context to the large language model or something like open AI chat GPT and we
do a bit of prompt engineering. Prompt engineering basically makes sure that we
only use the your data or data from that specific domain to generate the response
that there are no hallucinations so basically the information that is in the
answer is actually coming from the data. It provides the links to the information
coming from which is important and also able to provide say something like I
don't know if it doesn't have enough information. So all of these things are
part of the generation part but of course search and retrieval are some of the
more important parts and difficult parts and so that's something that I want to
talk about a little bit more. Like I said traditional search keyword search is
something that been used for a long time. It's mostly finding the exact words
that you've typed in in the documents. It's fast and cheap and domain independent.
Nowadays most places are switching to semantic search. It's meaning-based not
just looking for the words but that can go astray actually for very specialized
domains because it is trained on general English on public data and so what it
thinks are similar things and what it thinks means that you what you mean can
be different in a specific domain. So one of the solution to this problem is
using a hybrid search so you submit queries to both types of systems and
then you just merge the results together so that's something that we found works
much better than either one of those two. So let me talk a little bit more about
semantic search. The core of semantic search is basically a language model that's a
transformer language model. It's similar to architecture to models used by
OpenAI and chat GPT and other AI systems but they're much smaller and they convert
text to basically a vector that we call embedding and then the search works by
just finding vectors that are close by to the vector that's associated to the
query. Like I said it's much smaller than the large language models that OpenAI is
using so it's not as expensive to run and that's important for especially for
search. So yes it's affordable it's scalable if you use some of the
technologies some of the things that I showed on the first slide a lot of
those are vector databases that allow you to find these vectors that are
associated with the documents very quickly. Like I said it can be not so great
for specialized domains but you can fine-tune it. Fine-tuning basically is
taking the model and providing it with more examples from your specialized
domain and making it work for your domain and for these small models unlike for
alums that can be inexpensive and we found a lot of success doing this. One of
the things that large language models help there is actually generating queries
for the documents and we can use that as additional data to fine-tune these
models. Even though the fine-tuning really helps but still we found that using
hybrid approach so using both keyword search and the semantic search is the
best combination. In a lot of cases you have multiple types of results and for
example you might have documentation or you might have tutorials you might have
interactive communications etc and it's actually difficult to make it work well
for all kinds of different types of information. If you have large amounts
of data, large amounts of interaction then it's possible to tune these models
so that the different types of information doesn't get confused but in
most practical cases that we've dealt with where there is not as many people
that use the systems or the amount of data is not as huge. The solution would be
again retrieving different types of documents separately and then merging
the results post-retail. Another alternative and kind of similar solution is
using facets. So as you've seen for the for Google there is different facets
that are on the top of the page or additional filters in our system so
basically it's a way to restrict search to particular document type or using
other filters. It's particularly important if you want to restrict accesses to
certain types of documents for certain people or provide different view of the
content for different users. It's relatively easy to build for the keyword
search and for small-size semantic search indices but would require fairly
advanced solutions for larger indices. Finally the curated content is the
last type of content that I want to talk about. It's also particularly important
for smaller search engines because no matter what technology you're using
there are going to be gaps and there are going to be difficult cases and it's
always best to use your users and involve people and get them to produce the
content whether it's specific different types of documents just for including
into your search results or just recommending a particular search result
for specific query. Of course you usually provide the query for that content so
that we know where it would show up but matching only to specific query is not
very scalable so one way to provide the curated content in the search context is
to look for similar queries and whenever search for a similar query occurs also
exposing that custom curated content and the embeddings that we talked about
for the semantic search can be also used for the similarity of the queries. So
that's really the different things that I wanted to present some of the
techniques that we found useful in building custom search engines and
search engines that are outside of basically the public domain. So thank you
and now if you have any questions I'll be happy to answer.