From Perceptrons to transformers. Journey of Neural Networks

Introduction

So Chad GPT is amazing. All the generative AI is amazing. It is popularizing AI nowadays. It is hyped.

We can call it a new electricity. And we are doing cool things with it.

And if you are a business and startup and you are looking for funding, if you integrate into AI, it will get you that funding really fast.

So I'm Navid. I am doing PhD in neural networks from University of Coimbra. And I'm also a research engineer at Center for Responsible AI.

The Power of Generative AI

OK, so generative AI. We have seen in last two presentations that you can talk to it. And you can do amazing things with it. But it's even more than that. It's more than chat box.

Center for Responsible AI Initiatives

Center of Responsible AI is a consortium of businesses and universities. And they are trying to solve different challenges AI is having today.

And they have created a list of products And I will just show three of them.

1And one of the product is Halo. It is enabling an ALS patient to talk again.

PrimeRAM is making you, it helps you navigate the complex legal laws and challenges and requirements for your businesses.

Affine is helping doctors to extract useful information from the medical text.

Expanding the Use Cases for AI

So as you can see, generative AI can be used in a lot of use cases, and it can be more than just the chat boxes. OK.

A Brief History of AI

I also kind of like history, so I wanted to take this moment Like, I wanted to take a minute and appreciate the moment that where we are today in terms of AI.

And let's start with 1940s and 50s. I like to call it the era of curiosity. In this era, we were trying to find a link between a brain, a computational model, and artificial intelligence.

Matlock wrote his research in 1943, and it was the first ever computational model of a brain. Most of the things he said in his model are even fundamental and true today.

Alan Turing proposed his Turing test in 1950s, and Rosenblatt created the first neural network. It was the simplest form of neural network, just one layer, and it was a classifier, and it was able to classify into true and false.

So with this research of Frank Rosenblatt, we had hyped something similar to what Chad GPT did today. So we had some sort of hype back then in 1958.

The Rise and Fall of AI Interest

But then comes Minsky. In 1969, something happened that caused an AI winter. AI winter is something when people stop the interest. We lose interest in artificial intelligence.

And he wrote a book in which he criticized the preceptrons created by Rosenblatt. And he provided a mathematical proof that they might not be able to do all what is promised. And after that, funding into research of AI declined and people lost interest in AI. Although in his research, he specifically mentioned that his critique only applies to single layer neural networks, but somehow people ignored that fact and we had an AI winter.

The Resurgence and Evolution of AI

And then in 1986, we had a takeoff with a research paper which introduced us to multilayer neural networks and back propagation.

With this research, three groundbreaking things happened. We could now adjust the weights of a neural network to minimize error. It enabled us to, it enabled neural networks to learn complex representations and features.

And it gave us the ability to journalize from the training data to new unseen data. And now, at this time, we had an ingredient, a neural network which we can train on complex data. to create something that is called Transformers.

The Advent of Transformers

In 2017, under the banner of Google Brains, a research was published. It's called Attention is All You Need. And for the first time, we were introduced to the Transformers.

Transformers, this was a groundbreaking research in the field of natural language processing.

With Transformers, what we could do, we could actually weigh the importance of a word in a sentence. We were able to weigh the importance of a position of a word in a sentence. So we were able to see which word is important at what position.

And it also provided us with the ability to do over a longer text. This was the last key we needed before we could create a generative AI.

So now we have neural networks which can learn complex representations, we have transformers, transformer models that can, that are groundbreaking and very performant in processing natural neural, natural languages.

Generative AI and Its Growth

So, after that came GPT in 2018, it got famed, and we are getting so many generative AI models, in the last five or six years.

These are some of them. These are the famous ones. We have hundreds and thousands of models right now. It's hard to keep track of them, but these are some of the main generative AI models.

Challenges and Solutions for AI

These models are amazing, but they bring a lot of challenges. Trust me, there are more. I just listed what I could. right by heart at that time.

But my favorite one is Bias and Fearness because it provided a lot of meme material to the internet. We all remember Gemini creating diverse images of America's forefathers. So these are all the challenges researchers are trying to solve today. And all of them are important.

Energy Consumption in AI

But what I want to focus on today is resource intensive. One Google query takes 0.0003 kilowatt hour of energy or electricity, while ChatGPT takes almost 1,500% more electricity than a simple Google query. In a few years, AI could use as much electricity as a small country. And that's a problem.

That's a lot of electricity. So how are we going to solve this?

Sustainable Solutions for AI

There are basically two type of solutions. We can create energy efficient data centers. We can adopt green and sustainable practices. We can do hardware optimizations and create new type of chips.

So these are all new chips are coming in. Green practices are happening. Energy-efficient data centers are happening.

Innovations in AI Architectures

But me and my team at Sysuc are focusing on something different. We are trying to create efficient architecture of large language models or transformers with algorithmic improvements. And how are we going to do that?

Understanding Biological Neurons

For that, we have to understand a biological neuron. I'm not going into neuroscience today, but what we are seeing here is a biological neuron of a human brain.

We have three main areas here. Cell body, that is called soma. Axon is this long tube. And we have synapses.

One neuron connects to another neuron in synapses. soma generates some sort of electricity. And then this electricity, or the electric pulse, travels through axon to another neuron. And that's enough neuroscience to understand before I can jump into the next slide.

Comparing Traditional and Spiking Neural Networks

So when we say neural networks, there are two, I will say, traditional neural networks.

And in A, you can see a traditional neural network. And in B, you can see a spiking neural network. Spiking neural networks is the neural network that will mimic the biological behavior of a brain. So this is what it is going to mimic.

Here you can see in traditional neural network in figure A, we get some input from an activation function, which is a continuous activation. We sum it, we multiply it with the weights, we add some bias, and we get some output. So what is happening here is multiplication, summation, accumulation, and all these things are happening in the cell body.

While in spiking neural networks, we have spikes. It's either 1 or 0. And the soma body, instead of doing all this multiplication and accumulation, what it does, it has something that is called threshold. It receives spikes from the previous neuron. And if it reaches a certain threshold, it spikes. If it doesn't, it does not spike.

So the thing we need to understand here is in traditional neural networks, all neurons are active in that specific network all the time. While in spiking neural network, it's sparse. It's not dense. Only those neurons get activated, which gets the spike.

The Efficiency of Spiking Neural Networks

So it is already less dense than a traditional neuron. So when all of the network is not active all the time, it brings us to something that is very efficient. 1So they potentially use low, less energy. than a traditional neural network because spikes are binary events. They are ones and zeros versus continuous activation, which is continuous accumulation, multiplication, and everything. So they are more energy efficient.

Also, they mimic biological neurons. And neuromorphic hardware is a hardware that is designed to work like a biological brain. So if we have a spiking neural network, it will perform best on a neuromorphic heart failure.

Challenges with Spiking Neural Networks

Why not, since we already know large language models uses artificial neural networks, so why not just use spiking neural networks and make them energy efficient? That is not that simple. It's because these neural networks are complex to train. They use something that is called spike timing-dependent plasticity.

When we have one neuron, it is usually connected to many other neurons. And based on the time when the previous neuron, we call it presynaptic neuron, when it spikes, is crucial to determine. There is something that is called neuron that fire together, work together. So finding a neuron when a presynaptic pulse happened, spike happened, and when a postsynaptic spike happened, add a temporal dimension to this network. So these are complex to implement.

The second problem is neuromorphic hardware is very sophisticated. And currently, it is mostly in a research state. We can get the chips by giving research proposals, but it's not massively produced.

Spike GPT and Spikeformer: Pioneering Projects

So currently, we have some starting point where we have a project that is called Spike GPT. It's basically a generative AI project that is using a spiking neural network. And it is 20 times more efficient than traditional GPT, generative AI.

Spikeformer, in this, 57% more efficient transformers are created in this study. And we saw that if we can use spiking neural networks and transformers, they can like 57% more efficient, which is a huge gain.

And yeah, that's it for now. I hope it was easy to understand.

Finished reading?