An intuitive explanation of how neural networks work - a technical introduction for non-technical folks

Introduction

Good evening, ladies and gentlemen. My name is Akshay Karidal. And for those of you who may struggle with that name, you can also call me AK.

Understanding Neural Networks

So my topic is intuitive explanation of how neural networks work. Neural networks is the underlying technique for all the large language models. It's not just language. Image, video generation, everything works in neural networks.

So along the way, what I'd like to do is I'd like to explain some of these jargons that we throw around. A lot of you may already know, but for those of us who may not know what these jargons mean, we'll try and open a dummy black box of neural network, and we'll see what's inside it.

So we'll also do a little bit of mathematics today, but I'll promise you we won't be doing anything beyond addition and multiplication. Good?

So one of the things that I often get disturbed with is how people interpret a black box algorithm. So a lot of people say neural network is black box, perfectly all right. But then they also go out and say, nobody knows what it is doing.

That's not accurate. We know the what, but we don't know the why. What I mean by that is, we know the exact steps in order to get to the results that a neural network is giving you. But we don't know, explaining that is very difficult.

So what I was saying was, yeah, so neural networks are indeed a black box algorithm. But they're not black boxes in true sense. So when we say something is a black box, what we really mean by that is we do not understand what is happening inside the box. We perfectly understand what's happening inside the box.

1but we have struggled explaining that. And by the end of this session, you will be in a position to understand why that is. And we will also go over a few other terminologies like neurons, parameters.

We have a lot of billionaires floating out there when it comes to parameters. Every other model has a billion parameters. So we'll understand what parameters are, what hidden layers are, and so on.

The Misconception of Neural Networks as Sorcery

My motivation is for a lot of people that I speak with. So to give some background, I used to be a data scientist, now I'm a technology consultant. So I mostly speak to people from the business who may not have had the mathematical background or the background in data sciences. So a lot of them look at neural networks as some sort of sorcery.

It's not, right? A better analogy is, let's say anesthesia, right? So when a doctor administers nitrous oxide, we all go into a state of being unconscious.

So we are at that point devoid of all subjective experiences. No doctor knows how that happens. How do you come back to life after all your subjective experiences have been taken away? We don't know that yet, but we know exactly what is the dosage that needs to be administered in order for the doctor to operate on you.

So that's anesthesia. And with neural networks, we are at a similar stage. It has a lot of uses. But at the same time, we struggle to explain why it is producing the answers it is producing, right?

A Story to Illustrate Neural Networks

So I'll start with the story. So there were four kids in a remote corner, in a remote town in Yukon. Kevin, Carolyn, Andrew, and Maya. And one fine day when they went to play, they saw an alien spaceship stranded in their playing field.

So they approached the aliens, and the aliens said, we are Zorflings, and we are intergalactic nomads, and we have a problem with our orb. So what is the problem? So the orb takes two inputs, a mechanical input and a chemical input. Both are numerical inputs. And it then generates another output.

The output that is produced is what is used to fly their spaceship. So now the aliens perfectly know what inputs they have to provide. But their orb is not working. It's broken. So they're not able to get an output. So they're not able to fly out of Earth.

So they're pretty much stuck here. And what the aliens also tell the kids is, from our previous trips, we've been logging our records. So we know all the input as well as the output for the last 40 trips that we've made.

So then one of the kids, Maya, who's a mathematician, says, hey, I know how to solve this problem. And she starts a game. And this is the game.

Modeling Neural Network Operations Through a Game

And from here, you start to look at the neural network analogy. So in the game, Kevin and Caroline are paired together. And Andrew and Maya are paired together. And finally, the alien's also participating in the game. Each one of them now takes an input.

So the first, Kevin takes the mechanical input, the chemical input. He does some maths with that. Then he passes that on to the next set of people. That is Andre and Maya.

Caroline also does the same thing. She picks up the chemical input and the mechanical input. She processes it. She does some math. We'll look at the math a little later.

She does some math, and she will get the number. And the next set of people take the numbers from Kevin and Caroline, do their math, and the alien takes it. And then finally, he gets the output. So stay with me here.

Breaking Down the Mathematics

So this is just notation. We'll probably skip this for now. Just as an example, if the input, one of the input was two, the second input was three, Kevin is going to get to a number 15. He first selects a random number. He first selects three random numbers. So first random number he multiplies with the first input. The second random number he multiplies with the second input. And then he adds another random number to that. All this is random. So Kevin comes up with the number 15.

So you can do the math, so I won't go through that. And same thing with Caroline. Caroline comes at a number called, she again does some random number multiplication and addition, and finally comes out with a number 10.

Now that these guys have 10 and 15 as their numbers, the next set of guys come into action and they pick up 15 and 10 and start their operations. They again pick up a random number to multiply 10 and 15 with and add their own random numbers. So Andrew right now has 108 and Maya has seven. And it's the same thing that the alien does as well. So the alien also picks up the number from Andrew and Maya, adds random numbers, multiplies random numbers, and adds random numbers. And then finally, he comes at a number 1 or 2.

But when the input was 2 and 3, the output was 5. Right, so now the alien calculates what is called is the cost function. So this is one of the terminologies I promised to explain. So this is nothing, there are many cost functions, this is probably the simplest one. This is aliens' way of knowing how far away from the actual value we are. So they all generated random numbers, they all did random operations, and they came up with a random number, and that's one or two, and the actual number was five. So there's a big difference, 97.

The Learning Process in Neural Networks

And now they do the same operation with every other recorded trip. They had 40 recorded trips. With each one of those, they go through the same operation. They don't change their numbers now. Their numbers are fixed. And they use the same number and get to their output. And they find out the difference and they see what is the average difference. Now...

Let's, you know, we can do this in an Excel sheet. So I'm gonna switch over to a different sheet.

So basically what the alien does is, so this is the first record, the first line that you see, right? These are the random numbers that Kevin has chosen. These are the random numbers that Carolyn has chosen and so on and so forth.

Now, Alien knows that there's a big difference. The difference in that particular record is 97. So the number that was supposed to be is five. The number that came out as an output is 102, and the difference is 97.

And what is the overall difference across 40 records? That is 736. That's also called mean absolute error.

And now the alien gives feedback to the next guys. He gives feedback to Andrew and Maya to say, hey, this number is too much. You need to adjust your numbers that you had initially randomly assigned. And that's what they do.

Adjusting Parameters with Backpropagation

They randomly, they're not randomly, so there's a process to that. It's called back propagation. But let's just say, they update their numbers. And just by updating their numbers, by updating their multipliers and the numbers that they used to add, they will be able to get to the right answers.

Now let's just say we will use one of the random numbers. So let's play along here.

If I were to increase this number to four, Let's see what happens to the MAE, which is this large number here, 734. It was three and I'm changing it to four. What happened?

The number increased. So what should we do? We decrease it? Okay, it was three, I'll come down to two.

It decreased, okay. You want to decrease it further? Okay. To zero?

Yeah, give me a number. What's the next number that we should try? Minus one. Next.

Maybe we'll take a bigger leap. Minus 15. Minus 15. Minus 15.

Minus 27. OK. Went up. It's not going down all the time.

It keeps changing. What's more is right now we are just looking at one number. We have all these other numbers, other random numbers that other people have assigned. The job is to make sure all of them are adjusted at the same time.

Not at the same time, at one layer at a time. We do this with what is called as back propagation. To get you an intuition of what it is, so right now it is just one variable. If it were to be two variables, you will see it in a 3D space.

Visualizing Optimization in a 3D Space

So I have kind of created a 3D space here. Right, and the number five five on this 3D space is that red dot. Right, and your objective is to get to zero, or as close to zero as possible, right? And right now, you're here on this hill, and if you change, and if you look at what these accesses are,

Along the vertical axis is your error. So your objective is to reduce that number on the vertical axis. And along the plane, you have the numbers that are randomly assigned by these people. 1And you are now expected to change these numbers randomly so that you get to this point.

This point that has the minimum of MAE, right? And that's what we do.

But the problem is we just don't have two dimensions, right? In this particular example, we had how many? I think 15, right? We had 15 variables to play around with.

So that's essentially 15 dimension hyperplane that you have to optimize. And how do we do this?

We do this with what is called as back propagation. So again, I won't go into back propagation, but to give you what back propagation does is it takes, it moves along that mountain in small increment. That increment is called learning rate. So imagine a blind man walking on that mountain.

And he walks one step, and then says, yeah, this is going down. I'm going this way. That's the right direction to go. And he takes the next step, and then he says, yeah, this is going up.

This was a better spot, and I'll go forward. And that's how the algorithm works. without going into the mathematical details of how the Jacobian vector works and all of that. That's how the mathematics work.

Iterative Improvement and Feed Forward Process

And finally, and we'll come back to the Excel sheet, they go through this process over and over again. So they do one cycle. In the first cycle, everybody moves by a little bit, does a little bit of adjustment, and then... Their back propagation is complete.

Then they do the feed forward. What do we mean by feed forward? Feed forward, they do all the calculations all over again. So they now have a new set of numbers.

Let's say I'm going to copy it from here. So they have a new set of numbers, and they're... mean average error has dropped. So with every increment, their mean average error keeps fluctuating, and their focus is to bring that mean average error to as low as possible.

And this is the final state, just to give you some satisfaction that this problem is truly solved. Oopsie, I think I have it pasted in the wrong place. OK, so here are the numbers. So these are the final numbers, and this is the final error, close to zero. So voila, the aliens can go back to their planet or go about roaming the galaxy.

So backpropagation, I think we talked about it. It looks at, is the slope, or... Is the slope of the mountain decreasing? If the answer is yes, then it's a positive reinforcement. If the answer is no, it's a negative reinforcement. So that's back propagation.

And after every back propagation, there is forward propagation. So forward propagation is nothing but they do the multiplication through those nodes.

Key Terms in Neural Networks

So what are neurons? So each of the players here on this game is a neuron, and that neuron consists of the random number that he initially chose and then he continues to adjust, as well as the random number that he chose to add or subtract. That's a neuron.

What are parameters? all the numbers that they initially randomly chose are parameters. So when someone says, you know, my large language model has 50 billion parameters, so they have 50 billion of these numbers. So what that also means is so many multiplication and operations to be performed in order to get an inference from these models. So those are parameters.

And then finally, layers. So the first layer from which we took the input is called the input layer. Those are the known numbers that we know. And the players in between who selected all the, who did all the math are called hidden layers. And the final layer is the output layer that does the aggregation from all the layers. And when we say deep neural network, you typically have more hidden layers. So there are many more players in this game. That's when it becomes a deep neural network. Yeah, I think those are the terms.

Closing Remarks and Q&A

I'm open for questions. And there's one more thing that I should tell you.

This is an analogy. This example is riddled with a lot of inaccuracies, if you will. But the intention is for everybody to understand and not be accurate.

For example, we're not using what is called as an activation function in this example. So it has a lot of inaccuracies, but it should give you a good understanding of how neural networks work.

And it's not really a... a true black box where we don't know what happens when something's put in. We know perfectly well what happens when something's put in.

So that's all.

Thank you.