Designing Chips in the AI era

Introduction

My name is Riaz, and I'm going to talk a little bit about the hardware behind AI. So I'm a chip designer, so I've designed chips for data centers, and I just want to take you guys very quickly through the evolution of the hardware for AI and what are the main

challenges we're facing because eventually we want AI to be everywhere like not just in your computer but in your wearables and even in space right

A Brief History of AI Hardware

so a bit of history first so the first attempt which was quite successful by by today's standards, was by Marvin Minsky and Dean Edmonds.

Early Neural Networks Before Transistors

So they built a neural network, which was before the time of transistors. So they built a neural network out of 40 neurons.

So this is one neuron here. And you can imagine this is about this size. And there was no way you can actually build or put a language model onto this.

And even 20 years ago, more than 20 years ago, when I was at uni, we were learning about neural networks and how they can be used for image recognition, for example.

But then the interest in AI sort of died. and the main reason for that was because the hardware hadn't caught up yet so the

Transistors, Scaling, and Shrinking Devices

hardware was lagging behind tremendously so in the next 15 to 20 years what's happened is basically the transistor came along first and then transistors

Transistors are basically devices which switch on and off, and you've got transistors in every chip. And one of the nice things about transistors is that as you make them smaller, you can put more and more of them on a single chip, and you can do more and more compute.

So putting lots of transistors together not only gives you the ability to process larger and larger amounts of data but it also gives you the ability to shrink sizes of devices right like so you can have chips in your watch like a in your smart watch for example or even on your in your glasses so if we fast forward about 20

Modern AI Accelerators and Today’s Bottlenecks

20, sorry, 70 years later, we've got this piece of hardware, this is just an example, it's the NVIDIA H100 GPU, and this is basically an AI workhorse, so people use this to train models and also it's used for both training and inferencing. And this has about 80 billion transistors.

So you can imagine, for example, GPT -3, which is probably about 175 billion parameters. That's equivalent to around 100 neurons. So neuron is the brain cell, basically, right? 100 million neurons.

So the problem here is that even though we've got these really cool chips with billions and billions of transistors, we're still facing two key problems and the two problems are that one is compute, so basically how we do all the calculations that are needed to build a model to train it and for inferencing as

well and the second thing is actually the data link so at the heart of all these chips like all these very complex chips we're actually just doing some very very simple operations but we're doing it really fast on a lot of data so

Compute: Multiply-and-Add at Massive Scale

So basically what these things are doing is they're multiplying quantities and adding them. So that's the basic operation they're doing. But you're doing it on a lot of data and you're doing it really, really fast.

So that's the thing about compute. Now, we've got all these billion, trillion parameter models. They need lots and lots of compute power.

Data Movement: Memory and Interconnect Limits

but that's not the only problem because once you've multiplied two quantities and added them you have to still move this somewhere you need to store it say in memory so that's where data links come in and data links are basically used for moving data around so how are companies dealing with these two two problems, right?

Power and the Emerging Energy Barrier

So people are building processors like Google and NVIDIA, which are getting faster, but they're also burning a lot of power, so they're consuming a lot of energy.

And other companies are taking a bit more novel approaches. Like, for example, light matter, they transfer data using light, so using optics instead of electrical signals.

But in both cases, we see that we're coming up against an energy barrier. And this is happening because the amount of data is just increasing and it's going out of control really so there is really a huge push to

be able to design hardware that's a lot lower power and to be able to also design hardware that can move data around really really fast but also burn

Using AI to Design Hardware for AI

a lot less power and now I'd like to just give you guys an introduction about what I'm doing here.

So I founded a company that is building, it's basically an applied AI company and what we're doing is we're building a platform which allows us to build better chips that can compute faster with much lower power and

and also build better data links, which consume less power and are also faster. And what we're doing is basically building this platform,

which can be used by companies who are building both compute or data links or both to design chips faster, and with better performance.

So I haven't got a demo here to show you guys how to actually use this platform, because essentially what we're doing

is we're building models, which are able to then design chips. So we are using AI to build hardware for AI. So that's the whole philosophy.

But I'm gonna just quickly share, I just got one more slide really.

Chipsolver: Platform Overview

So my company's called Chipsolver and what we have, this is just a snapshot of one of the interfaces. So what we have is

essentially three parts to building a model that can actually design chips.

The first part is a simulator and then there is a part called specs and there's a trainer.

Simulator: Learning Transistor Behavior by Generating Data

So all these are features of our platform and what the simulator does is basically the simulator needs to understand how transistors work and the way it does that the way it learns how transistors work is by simulating it simulating a model of the transistor and generating data.

1So this does not require data that's publicly available. 1You actually generate your own data using a simulator. So that creates your data set for you, so that's the first step.

Specs: Stating What the Chip Must Achieve

The second step is that the specs, what are the specs? 1The specs are really what you want to achieve with your chip. So let's say you want to build a chip that actually is a transformer, like what Paolo described earlier.

And the specs for that would be, for example, the number of layers it needs, the power consumption, the speed at which it works. works. So those are the specs you would have to give it.

Trainer: A Human-Guided Design Agent

The last one, that's the trainer. What the trainer is, a trainer is actually like an agent. So you can think of it as an

agent, but it requires a lot of guidance from a human chip designer, so that at the beginning at least so you can think of a trainer as a agent which is a junior design engineer and what you do is you train this engineer to be able to design a part of the chip for you so with these three things what we've done is we've

Early Results: Small Circuits and Speedups

We've built some small models which can design small circuits. So by small circuits, I mean circuits that contain up to 200 transistors.

So just to give you an example of the kind of performance that we achieved, we trained one of these models within two hours on a 24 core machine and basically it was able to design a very simple circuit which has like about seven transistors

so we were talking about 80 billion transistors earlier in the h100 but this is seven seven transistors and yeah we after the model is trained you can then design a circuit with seven components or seven transistors with the specifications that you want within an hour but for a human it would take maybe

two days or more and we've scaled this to up to 200 transistors and what we saw is that it took us three days to train the model but after it was trained we were able to design a completed design of a circuit within two hours so and for a human that would take a couple of months to do in terms of so it's not

actually about saving time, so chip design involves a lot of iterations, right? So you have to, because you have to iterate around the process, it's a bit like building software where, you know, you have to test it and you iterate, but

hardware design iterations are much longer and take a lot more effort. So

Conclusion

what we're hoping is that we can eventually scale this to millions of of transistors and build more and more complex chips that can solve both the problems of compute and data communication.

Q&A

Any questions? That was it, by the way.

Finished reading?