The future of AI is not AGI but narrow AI designed to solve specific human problems

Okay, great. Thanks very much for having me out tonight.

Introduction

I'll give you a brief introduction by myself. My name is Gary Cernverda.

I've been in the machine learning space for 30 years. I have a master's degree in aerospace engineering from the University of Toronto.

And when I was an undergraduate, attended Jeffrey Hinton would hold a bunch of workshops on machine learning and neural nets while I was in undergrad and grad school. So I got exposed to machine learning back then.

Background and Experience

I ran IBM Canada's data mining practice. Data mining was the buzzword for AI back in the 90s. So I ran that business for

uh so the first issue i have with with agi really is um i don't really know what that means like what is agi what is a post singularity world about if you i listened recently to ray kurzweil and joe rogan on do a podcast and it sounded like it was just an insane conversation so are we going to have agis be godlike and we're going to be able to create our own universe in our living room and we're going to be able to teleport ourselves through wormholes like Is it a singularity we're going to exponentially grow forever? Like there's no plateau. It seems very irrational to me.

Understanding AGI

So I don't really know what AGI is. All I know is let's solve problems with technology. We don't need to label what it is. And everything we build is superhuman.

Everything we have invented since the dawn of humanity is superhuman by definition. Our caveman ancestors would carry water around in a cup or a bag because it held water better than human hands do. It's superhuman, right?

A hammer pounding nails, that's superhuman. You can't pound nails with your hands, right?

A computational fluid dynamics solver from my grad school days, it can calculate the airflow around an F1 car better than a human with pencil and paper, right? If an invention wasn't superhuman, it would be absolutely useless. So everything we do is superhuman.

Superhuman Inventions

What is this hype about singularity and superintelligence, right?

A general solution that could hold liquids, drive on the ground, go to outer space, fly the air, hammer nails into frames,

would be horribly inefficient you would never build such a thing you would build very specific solutions that are best you know good at everything is best at none like the swiss army knife is kind of okay in a pinch but you wouldn't use it to like build a house right you know it's not it's not purpose specific for for a very special task so so i think we need to focus back on that i think we need to focus back on the scientific method

Scientific Method and Data

Just because we have data doesn't mean it explains the way the universe works. you know, the scientific method is ask a question, do some research, look at the status quo, create a hypothesis, usually in the form of a mathematical equation.

Don't go get all the data. You don't need all the data. You just need the data that's relevant to your theory, right? And then you go test that. Does it work? No. Yes.

If it doesn't work, then you go back and adjust your hypothesis, rerun some experiments, right? If it does work and it then you test it against the real world if the results are good then you publish and then everybody tries to crap all over your theory tell you you're wrong make it better peer review and knowledge advances everything we have built since the dawn of time or in the last 500 years has been with the scientific method nothing has been built with a statistical machine learning model the chair you are sitting in was invented with

Importance of Theory

laws of physics. This computer we're looking at was invented by Einstein and his compatriots who invented quantum mechanics. They had no data.

If they were waiting for data, we wouldn't have all this stuff today. It was human mind and theory and innovation that created this. Complex system dynamics cannot be learned from data.

Systems that have uncountable inputs or interactions or state-action pairs are complex. So you could not learn Einstein's theory of relativity, even if you knew the location of all 10 to the 80 molecules or atoms in the universe.

So here's an example. When you have the wrong theory, it doesn't matter how much data you collect. So historically, we thought originally that the Earth was the center of the universe and all the planets in the sun did circular orbits around that.

So you see the little pink graph below. This was Aristotle. The vertical axis is the error rate. um the horizontal axis is the volume of data so they were trying to predict planetary location so the wrong theory no matter how much data you collect at some point it plateaus out and the error rate's terrible and then we saw the retrograde motion of the planets you know someday depending on what side of the sun you are the planets look like they change direction and so Ptolemy invented this epicycles again earth-centered universe and this was a terrible theory no matter how much data you collected the error rate was basically flat from the beginning

And then along came Copernicus and said, no, it's a heliocentric universe and we do circular orbits around that. And we found out with very little data, quickly got to the error rate. And then after that, it made no difference. Then along came Kepler, said they're elliptical orbits.

around the sun with almost no data at all, you got to this very low error rate and again it flattened out and with Einstein we got even more accurate. So if you have the right theory, you don't need a lot of data.

I think this big data hype, the only one that benefits is storage manufacturers and storage sales. You don't need to collect every piece of data in the universe to innovate and create amazing things.

Dynamics and Decision-Making

So the system dynamics, how a system works, that's different than what should I do with the, what decisions should I make? And we seem to confound decision-making and system dynamics together. We need to separate those two things.

LLMs are hard to learn because it's the system dynamics, the dynamics of sequential text or sequential tokens. That is very expensive to learn the dynamics. Whereas in the game of Go, you don't need to learn the dynamics of Go.

There are rules. There are very specific rules of Go and chess that you can code up. What to do, how to make a decision, that's what you're trying to learn. That's the reinforcement learning you do to play Go and chess and other systems.

So we need to separate these two things. Learning dynamics is very expensive. And all of human decision-making basically fits this paradigm.

Human-in-the-Loop Decision-Making

You're asking, what decision should I make to get a desired outcome? And the LLMs are still a human-in-the-loop decision-making thing. We're not doing any autonomous decision.

We have a lot of other autonomous systems that have existed for decades. You know, a thermostat is autonomous, right? Nuclear power plants are pretty autonomous.

They have human beings monitoring SCADA systems on the wall, but these are autonomous systems where there's decision-making policies, safety frameworks that exist, all of these. I talked in my last talk about this. that you need to build control frameworks and statistical process control to make sure you don't make bad decisions.

Reinforcement Learning and Business

So reinforcement learning is used to find an optimal policy and many systems like a self-driving car is finite so you can learn how to drive a car from data. Language is finite, you can learn a language model like these large language models because the data in language is finite.

I think ChatGPT has 100,000 or so words or tokens in its dictionary, whereas chess, Go, and business is infinite. You know, you have to specify the dynamics, the laws of physics, you have to specify the dynamics because you can't learn them from data, that the systems are far too large.

Chess is 10 to the 123, well chess and Go are games, they have rules, whereas business, the laws of business are infinite. So general approaches are very power hungry, right?

Challenges and Efficiency in AI

So the final training on ChatGPT that took 24,000 NVIDIA H100 GPUs, 100 days at a cost of $100 million. To me, that's absolutely insane. DeepSeek used 10,000 NVIDIA, I think, A100 GPUs. I don't know how many days it ran, but it costs six and a half million better, right?

Open AI is talking about building nuclear power plants for like, that's crazy. And data centers are projected to take 9% of all the energy created by humanity by 2030. Again, that's insane.

And, you know, what's the business value created from all this, right? Who's winning?

Business Value and AI

is groceries getting any cheaper are we are car prices going down is hydro getting less you know uh is it impacting our lives you know that great it's making our research easier we can do a lot of cool things with llms but who's winning right uh whereas a narrow approach so we at that daisy built uh autonomous merchandise planning so helping retailers decide what to promote what prices to charge how much inventory to allocate with no human in the loop. So a human completely out of the loop.

And to train that system, we invented a theory. I'll show you how to invent a theory.

And we used four GPU cards, four generations old. So like four generations older than what ChatGPT uses. We'd run it for 24 hours a week. helped our largest retailer. We did this for 12 years.

Our largest retail customer was a $25 billion retailer, and we grew their sales by $500 million a year, every year with four GPU cards that you could buy for a couple grand, and computing costs were next to nothing, right?

You can train a direct marketing response model using RFM. Basically, take three variables. How recently did your customer shop? How frequently, how often do they shop? How much money do they spend? Bucket those into five variables 20 percentile groups, one, two, three, four, five, three, five indicators, add those three numbers together, you get 125 possible combinations and build a direct response model.

Over-Complexification

And that takes 10 minutes to do. You can do it on your laptop or a standard database server, and that'll make millions of dollars for your customer. You don't need to build anything complex.

So I think we are over-complexifying the world in a massive way, right? So who's winning? I think the only people winning are the tech companies, right?

I have never seen so many ads on run your AI compute load on Oracle compute infrastructure on every radio station I'm listening to. I've never heard this in my whole life. I've been in tech my whole life, right? I think we're driving more and more compute and big data storage for the sake of keeping these tech industries alive.

Known Dynamics vs. Data Learning

Learning dynamics from data is not efficient sciences. Any sophisticated problem we solve starts with known dynamics.

When we do drug discovery, We don't learn molecular chemistry or quantum chemistry from the data. We code that in, and then we do simulation, right?

We don't learn how to design aircraft, spacecraft and race cars. We don't learn aerodynamics. That's the laws of physics. We code that up.

We don't learn general relativity or Newtonian gravity to calculate spacecraft trajectories. For building bridges and buildings, we don't invent statics and dynamics.

for particle physics, you know, we didn't learn quantum field theory, we didn't learn quantum mechanics from data, right? So, the idea is, can we code dynamics much more efficient to train those models.

We can maybe learn some additional dynamics to improve that. We can make improvements to the status quo and then use reinforcement learning to find the optimal decision or design that meets some desired criteria.

Hybrid Approaches

You know, not saying not to do LLMs, but can we hybridize them in some way? Can we, before we roll out nuclear power plants in every city training our LLMs, can we identify very narrow specific business cases and the benefit to humanity.

Can we find more efficient ways, like DeepSeq was an amazing step, 100 million to six and a half million, that's great. Let's get it down to the same cost that we spend to run physics models today and do drug discovery. Those computing costs are fractions of what it is.

Let's build smaller models. You know, we don't need to build these general purpose. You know, general means it's good at, okay at everything, but not great at anything.

So let's narrow them down and combine them when we can with human innovation and known dynamics. This is much more efficient, right?

Theory of Business

How do you invent the theory of business? Very simple.

To improve performance, make better decisions. The whole IT industry has been about decision support. Making better decisions ends up with better results.

When you make different decisions, you will have some dynamics that happen in an industry. This is a simple differential equation.

You can read my patents. We've patented this.

Retail Dynamics and Outliers

In retail, it's consumers buy full meals, not items, or they buy solutions. You know, if you promote ground beef, customer will buy pasta, tomato sauce, cheese, bread, wine, salad fixings. You know, there's a halo.

There's cannibalization because they bought Pepsi, they didn't buy Coke. Because they bought it this week, they stocked up and bought four weeks supply of non-perishable goods. So they stole from the future. You know, those are, that's the halo.

It's an interactive kind of retail dynamics that is going on. There's another dynamics.

If you're doing like fraud detection or medical diagnosis, it's outlier detection. Are you different than the norm? When you go to the doctor, they take your measurements.

Like the first speaker said, do you have anomalies in your lung measurements? Well, those anomalies mean you're different than the norm, means you're sick.

So outlier detection is a big branch of human decision making. But it's very difficult to do outlier detection. There's the obvious orange rock in the middle, but if you group the rocks into peer groups by size, color, whether they have damage or not, finding outliers is very difficult.

Mathematical Models in Retail

right so every business is a combination of outliers and interactions right and some industries are more about one than the other so here's a theory of retail i won't go through the math but essentially you know better decisions equals improved performance the background is you need to know where are your stores where do my customers live that's the first equation the gravity model if you're familiar with gravity the first equation is

Poisson's equation generally and you can think of good stores are very gravitationally attractive and will draw people from far away like you'll drive farther to a Walmart superstore than you would to your local convenience store right and gravitational attractiveness how attractive it is going to Walmart this week is how good are their promotions do they have really cool promotions are they discounting the items you care about a lot and promoting and letting you know so you'll tend to drive farther for those promotions.

How you lay out the store. Can you find the products when you walk in the store? It's a path integral. Where the high traffic flow is in the store, that's likely where you'll buy a lot of products. So that's how you can lay out your stores.

And in the middle, it's how do you pick the best items to promote and the best prices. And all these are a couple differential equations that you can solve. This is the algebraic version. You can find the patents that I have online where it talks about the math. And you can solve these differential equations. and find a solution.

The other thing you can do is once you have an optimal solution, let's say you have an N dimensional space, 100,000 items, every item is a dimension. So now you have an optimal combination of items to promote. It's at a peak in your 100,000 dimensional surface.

And you want to say, what is the value of each item What is the last step? So I can calculate a partial derivative, you know, calculus, you know, take away plus one item or minus one item in every direction and say, what was the impact of that last item to get to the optimum, to find out what the value of each item is.

And now that's basically now assigning a quality metric and a value to each decision, which item is the best, which item has the biggest impact in that last step to the optimum. So we can, we can calculate that. and then essentially apply outlier decision.

Real-World Application

This is a real retailer, US retailer. We took one as an example of one flyer, 200 ads. We divided them into 10 percentiles, so 20 items per bucket.

The best 10 items or the best 20 items using that partial differential slope to find the best quality. top 10% had $1,100 per ad per week per store, whereas the bottom 10% was only $38.

So now to do outlier decision making is get rid of the bad decisions, do more of the good decisions. If we got rid of the bottom five deciles and added all of those ads to the top five deciles, that grows sales by $3.9 billion for this $80 billion retailer.

So simply applying math and science, doing this outlier decision-making, right? Boston Consulting Group agrees with me. They wrote a paper, said, do less bad, more good, equals 1% sales, while Daisy's AI did better. We did like 5%, so, right?

And this works, science works at 20 customers plus. These graphs all show the vertical axis is year-over-year sales growth. And the black line is when these companies hired Daisy. The red line was the trend before Daisy.

So you can see all of these retailers' sales was declining before Daisy. They hired Daisy. We helped them get rid of bad decisions, do more good decisions. The green lines are all positive. They all started growing sales.

The bottom one in the middle is Walmart. They fired us for six months and then sales went crap. They hired us back and we reduced the size of their flyers by 50%. And so they made me maintain flat sales with half the advertising cost.

So this works. Science works. The scientific method works. Narrow AI works.

Conclusion

And I would want you to listen to this Carl Sagan. I don't know if you know who Carl Sagan is.

I'm an old guy. I know who he is.

We've arranged a society based on science and technology in which nobody understands anything about science and technology. And this combustible mixture of ignorance and power, sooner or later, is going to blow up in our faces.

Science is more than a body of knowledge. It's a way of thinking, a way of skeptically interrogating the universe with a fine understanding of human fallibility.

If we are not able to ask skeptical questions, to interrogate those who tell us that something is true, to be skeptical of those in authority, then we're up for grabs for the next charlatan, political or religious, who comes ambling along.

It's a thing that Jefferson laid great stress on. It wasn't enough, he said, to enshrine some rights in a constitution or a bill of rights. The people had to be educated.

Final Thoughts

So I think that's a very telling quote. We need to educate ourselves, be skeptical of what everybody's telling us, look at what's worked in the past, apply common sense.

Not to say that we shouldn't be doing the things we're doing, that LLMs and Gen AI aren't great things, but let's be skeptical. Don't buy into the hype completely at blind faith.

There's been a scientific method and approach and humanity that has created the world we live in today with human innovation and i think human innovation will still continue to be the driver for the rest of time llms and technology is not going to take over that's my belief nor should we let it regardless we should build tools that help humanity help us live better lives and continue to just focus on that so i think narrow ai will dominate and we should continue to do those things going forward thanks very much

Finished reading?