So, I was originally going to give a talk on using language models to evaluate businesses.
It got a bit too complex going into the ins and outs of SEC statements and file lengths.
I actually gave a talk at the MindStone thing a couple months ago where I talked about some
of the different tools that I've been building and using in my life, using no-code coding
solutions.
So, different things that I've built just for kind of personal interest and curiosity.
One was a proofreader GPT that knows my writing style and can respond in the way that I would
normally talk.
Two is an email tool that takes in email, takes it as essentially how my writing style
would be and prepares a draft.
Second is a content filter that takes in many different newsletters and organizing it, pulling
out the information that's relevant.
Third or fourth is a language tutor, so I'm actively learning Chinese and I've got a tool
that basically takes my sent messages and messages that I've sent through the day and
then will teach me how to say them in Chinese and pull out the kind of the key terms.
And then another one that tracks my workout schedule and gives me suggestions on what
to do next as well as another bot that organizes my calendar.
And in the previous talk, I really kind of dove into the first two bits, which is a proofreader
GPT and the email manager.
For the sake of this talk, I want to kind of look at how we can use language models in
curating the content that comes into us because I get massively frustrated with just useless
content, which I feel like we're inundated with it all the time.
To build this stuff, I mainly use mostly no-code coding tools.
So in the previous talk, I relied a lot on a chat interface, chat GPT, Google Sheets,
which if you're not using chat GPT within Google Sheets, I'd highly recommend you check
it out.
There's two different plugins that you can use.
One's called Sheet GPT, I think another one's called GPT for work or something like that.
And then the last bit is Make.com.
And if you're not familiar with Make.com, it's a no-code coding tool that allows you
to plug together different applications to work together.
Now this is all the no-code coding solutions.
I'm gradually getting more technical.
And so in this talk, I'm going to be using a RSS reader.
So my favorite RSS reader is a thing called Feedly, as well as Airtable, which is essentially
a database and a little bit of Python.
So as I get into it, I want to focus this talk a little bit more on the practical solutions
of what we do in the market state-a-day.
And really, it breaks down to four things.
So one, we learn how companies, industries, economies work, the cause-and-effect relationships
and the mechanics of them.
Two, we stay updated on what's happening in the world and feed that information into
the way we think.
Three is we rationalize, we use logic scenario analysis to evaluate the risks, understand
valuation, and take a view on certain circumstances.
And then four is learn the lessons of day-to-day.
So this could be through observation, reading, and a lot of trial and error.
But the challenge that you run into, and if you talk to anybody in the markets, the challenge
that everyone will talk about is just not enough time in the day.
There's too many situations.
There's too much content coming in.
If you sit at one of these screens in front of six screens, you basically have news alerts,
stock tickers, SEC filings, research reports, calls coming in.
It's basically a lot like this guy Billy drinking from the fire hose.
And it feels like this every day.
And so my question was, and I think if you step out of the finance world and just think
about our world in general, I feel like we hear a lot about how we are the products for
a lot of these free tools, whether that be Meta, YouTube, or whatever it is.
And as you think about it, as you live, like engage with a lot of these tools, if we're
the product, then the UX that's set up is basically a UX that's designed to optimize
to get us to click on more stuff.
It's not a UX that's designed to really give us the most value and give us the content
that we want in the way that we want it.
And stuff like this really frustrates me.
So I've been trying to figure out ways to circumvent that and build tools that can basically
give me the information I want in the way that I want it.
So here, wealth of information creates a poverty of attention.
So really as I thought about this, and I thought about the use cases of language models, we've
really gone from counting the number of positive and negative words to then understanding the
similarities of words to then understanding the words in context to now we can do much
more conceptual analyses with language models.
You can break apart things in very new and novel ways.
And so I set out to essentially do three things.
One was to take in the content and organize it and put it in these different places in
the way that I like it so it's accessible for me.
Two is I wanted to get away to be alerted for content that's relevant for me and new.
And then three was I want a way of keeping score, meaning that you have people saying
their opinions in different stocks, different circumstances all the time in the markets.
And most of us are wrong more than 50% of the time.
So for me, I wanted to have a way of measuring that and tracking that.
So if I think about the different types of data that's out there, you have SEC filings,
you have news feeds, you have transcripts, you have a lot of different things.
For the sake of this talk, I really wanted to, I mean, the thing I'm trying to do is
quite vast for someone that's not an engineer.
And for the sake of this talk, I've really kind of focused in on things that are likely
to give me more forward looking insight rather than just backward losing looking news feed
because easy get lost in a lot of that stuff.
So in terms of stuff that's coming in, if I just take those three things, one is incoming
emails.
So I get a lot of emails from various different brokers and banks, but I do think this is
more transferable to any sort of content that you have coming in to organize it.
Two is actually I find conversations, whether that be transcripts of calls or it could be
YouTube.
I'm using YouTube in this case.
And then three is PDF.
So I wanted to demonstrate how chat GPT can be used to essentially process the incoming
information that's coming to you via one of these transcripts.
And so if you're not familiar, YouTube, actually any channel that you have, you can essentially
convert into an RSS feed.
So if you don't know what an RSS feed is, it's basically a news feed that you can have
any time a new piece of content comes into that channel, it will automatically go into
that space.
So I use a tool called Feedly, and then I organize various different folders across
all these different avenues.
So this helps me manage my email, many different content coming in from many different places.
And then I organize that by folder.
And so essentially what that does is it allows me to have access to the content I want.
The problem with it though is that if you don't check this all the time, then the content
just passes by and you kind of miss the uses of it.
So how this kind of works for me is if something in one of my Feedly folders fits a certain
criteria, so there's a certain number of investors that I want to track and know what
they're talking about.
If something comes up, like for instance, this guy talks about a potential separation
or splitting of the company 7-Eleven, which is a Japanese parent and he's talking about
how there will be more value if you had the US business separate from the Japanese business.
So this was something that I processed with the script that I built.
And essentially what it does is it just pulls the transcript out of the video, puts it into
a separate document.
Now on its own, this isn't really not that valuable.
It doesn't do much for you and it's kind of messy.
And so I use a series of, I basically wrap this in a series of chat EBT prompts to get
more value out of it and then get it into the right format that I need to put it into
my database.
And so I tried to do this all in one go, but chat EBT has a tendency to want to summarize
things and you lose a lot of information content.
So I do this in a couple steps.
So one is I ask it to give it back to me word for word, but in a more clean version.
So some are separating it into sections, adding a relevant topic header for each section.
And essentially it comes back looking a little bit like this.
And then I give it another prompt where basically I ask it to separate and I give it the schema
of what I want this to return.
And if the scheme is not relevant for the content, I ask it to return nothing.
And so what that enables me to do is I can turn this easily into a JSON file and I can
say, okay, for this talk, you know, his thesis was that 711 parent company can separate,
the catalyst is here, you know, shares the risks, valuation, direction, time period and
any additional commentary.
And so then with that, I can quickly or automatically put this into a database.
And so with these databases, I mean, you could argue that why don't you just take all the
information and put it into some type of, you know, vector database.
I think I've seen that done and I felt like the challenge with it is you can't then go
back and look at the records of everything in the way that I'd like to.
And the nice thing about doing it this way, or at least how what I'm finding is that if
all the information around you, you have kind of filtering into a single database that you
can take different bits and you can really start organizing it and comparing it in the
way that you'd want.
So I'm doing the same thing with email, you know, this is research reports in here, this
YouTube, there's calls that I've had, so if I have a call, then I just take the transcript
or take my notes and stick it in as well.
So that's content processing.
I think there's a couple other interesting use cases for chatGPT here.
And so if we look at the other forms of content that come in for me, this is like an example
of an email and I'll probably get like two dozen of these every day.
And this is maybe only half of it, so don't bother reading it, but essentially you just
get an enormous amount of content separated by a lot of different tickers.
And so what I'm doing here is I'm now using a script to take those in, separate them,
take them into various different chunks, and so you take this, it separates each different
paragraph out into its own snippet.
And then with each of these snippets I can start to do performing further analysis using
other language functions on it.
So then that will put them into each a note file for every respective ticker.
So this is just kind of automatically happening in the background.
I now have something like 450 note files.
And the nice thing about that is if I want to now get up to speed on a new company instead
of chasing down a bunch of information in a bunch of different places, I can go into
a single note file and each of those note files is then organized, which I get into in a second.
So in terms of organizing it, I start asking a few questions.
So I wanted to say, okay, what is this piece of content?
Like, is this sales commentary?
Is this a hypothesis?
Or is this something that's factually happened?
Like, there's lots of different pieces of content that come in.
And I'm finding you can use the language model to identify what is it.
The second question is, is this relevant for me?
And actually, as I explore that question, that's where I actually find there's something
really interesting here that I'm trying to now transfer and broaden out.
And then the last bit was, how does that compare to what the existing note file looks like?
So what is our existing understanding of the situation?
And is this generally new information or is it repetitive?
Because you often have situations where you get the same piece of content across 10 different
emails all duplicative and it's not helpful.
So the second use case here was really this idea of zero shot classification.
So you can essentially with each snippet, you can then ask it or kind of force it to
go in whatever direction the language model thinks is the right use case for it.
Of course, if you don't set these correctly, you will get hallucinations and you will get
things that don't make any sense.
So there needs to be a match between the content and what you're feeding it and what you're
asking it to basically fit it to.
But ultimately, again, that produces a JSON, which then goes into the individual company
databases.
Now coming to the interesting bit, which is really about the relevant scoring.
And so what I started to do in each of the different note files was I started to say,
okay, write down my interests.
These are the things that I care about with this particular situation.
And then compare whatever that piece of content is to those interests.
And I found out that information scoring or that relevancy scoring was kind of interesting.
And I started to abstract that away and do that for news.
So if I just wrote a very simple one for news, I would write this in a prompt where this
is all going into one big prompt.
And in circumstances like the interests are a lot longer.
But let's just say I wrote, I'm interested in these things.
So I'm interested in machine learning, what's going on with Russia and Ukraine, Israel and
Gaza, developments in the oil market, China.
I'm less interested in partisan politics.
I'm less interested in pop culture.
What I can do with this now is I can basically take this and application and I can take the
incoming news feeds of Reuters, Bloomberg, pretty much everything, chuck that into Google
Sheets and then singularly put this comparison against that and get a score.
And so as I do that, I get basically this.
So instead of just getting like the Wall Street Journal or the Financial Times or whatever
it may be, I now get basically a news feed that's curated to the interests that I've
set out.
And it's not one that's curated to what I've clicked on.
It's curated to what I'm telling it.
And I think that's particularly interesting, particularly as we step back and think about
the world we live in where it's not been like that.
I'm really excited about the potential for us all to be able to build our own curated
content flows.
So just to finish on a quote from Yvall Harari, he just says like language is the operating
system of humanity.
Now that I feel like we have these models where you can basically cut data in much more conceptual
ways rather than just picking out certain words or doing keyword search, I'm just really
excited with what we can do with it.
And if there are technical people that want to work on interesting things together, yeah,
come say hi.
So that's me.
Thank you.
Thank you.
Thank you.