Hello, everyone. This part of the event is purely theoretical, and I'll try to make it as easy as possible to understand.
But we're going to go through the history of what is AI image generation. I'm going to hop into a time machine, and then we'll start from 2014. The contents will be mainly that, which is the GANs, General Adversarial Networks, and how they've evolved over time and how they've influenced AI image generation in today's date.
I'm also going to do a live demo of this little device right here. That's going to be the fun part to calm down all the theory, and I'll get to it when we get to it.
Okay, so let's begin a little bit about me. I'm an entrepreneur. I'm a multidisciplinary person. I am self-taught in a lot of things.
I used to do graphic design, visual design, video design as well. I worked in freelance for a little while. For now, I'm self-employed and I work in a consulting company for immigration. So I help Russian-speaking people come in too.
Spain and legalize here I'm also an avid fan of hackathons I've recently came back from a hackathon in Barcelona I don't know if you know but it's called hack Barna and they have them twice a year it was a big event 130 attendees we had one day to create a functional demo project and that would be our project right here at the end I'll leave it in a QR code if you want to check it out we Yes, it was a very big hackathon, a lot of events, a lot of prizes.
We actually managed to get top 10 out of 40 projects, which was a feat. We made an AI agent through NA10, which you can see on my laptop, and Lovable making an application, and we hook it all up. And it was basically an assistant for people taking care of their dogs.
So let's begin the context of our presentation, the challenge.
So the initial limitation in 2014 to image generation was that the computers back then were very rigid and people would have to program rules into them on how they would analyze and generate and process images.
And that all changed with an advance in 2017 that's going to be further on. But the main problem with rigid rules is that an AI cannot really capture, what images have in them, what our eyes can see, the detail that there is, the lighting, all of the effects that seem very simple but are really hard to explain to a machine.
So the fundamental question is, can we have more fluid rules for AI and teaching it how to do things? And the answer is yes.
In 2014, Ian Goodfellow introduced the adversarial network, which in simple terms, you can see here on this diagram, is composed of a generator and a discriminator. A generator is a simple AI algorithm which takes in noise and converts it into an image that seems kind of like something.
It's guided by features. So, for example, a cat would have two ears, whiskers, would have a nose, and it would try to predict what a cat would look like based on noise.
And the discriminator network, the second network, will catch him in the act and is comparing to a data set of real images of cats, say, if that's a good imitation or if it's a bad imitation. If it's a bad imitation, it says that you should go back to the drawing board and try again.
1And they will go into this cat and mouse race. And one would try to fool the other. And they would both adapt and learn in the meantime.
That's the basic fundamental knowledge of what learning algorithms are when it comes to image generation. And the technology itself was a little bit stumped for two years, till 2016.
And in 2017, there was a breakthrough done by the NVIDIA researchers, which was progressive GAN. They basically did what's shown in this image.
They've added extra convolution layers, which is something like those feature layers that I talked about. So they added more and more and more feature layers while the image is being thrown back and forth between the teacher and the student.
And by doing that, they could upscale the images from starting like 64 by 64 pixel, basically invisible, not really useful, 1024 by 1024, which is actually a decent quality image.
And that was also later improved with StyleGAN, which gave control through text generation. So you could influence it through text. And then StyleGAN refined that with the second and third edition.
So going to the diffusion process, how does it work? On the right, you can see a simple schema of an image and Gaussian noise being added on top of it. It's done in steps.
So you add little by little noise and show it to the AI. And the goal of the AI is to take full, pure noise and decode it little by little into an image you could actually see.
And that's what was done in June of 2020 with the introduction of the denosing diffusion probabilistic model, or diffusion model, easier said than whatever this is. And basically, the forward process initially required around 1,000 steps.
But as I'll get to it later, now it can be done in four steps. either four or 25 steps could yield a pretty good result. So the technology has advanced pretty much drastically in the last five years.
So the text-to-image, which is the missing link, going back to my colleague about OpenAI, it was their big breakthrough that no one really talks about. It was Clipper, or Clip. It was an AI algorithm that would annotate images with their corresponding name, sort of, that you could call it that.
For example, to the right, you have a photo of a cat, a photo of a dog, and a photo of a man. So those are transferred into matrix calculations, which are these numbers over here. So don't be scared of that.
The green numbers are the photos. So that would be the location of the photo within the multidimensional layer space that's within the AI. That's kind of hard to get into. But what it would do, and the algorithm would optimize what the description is to where the image is in that multidimensional space.
So imagine just like a 3D cube with a lot of dots in it, and each dot corresponds to an image, and each dot corresponds to a name of the image. And the goal of the algorithm is to get as precise as possible with the numbers. So you see like 021, 023. They're kind of close next to each other.
because it's a photo of a cat and a photo of a cat. But if it's a photo of a dog and the tag is a photo of a cat, then there is about 0.60 of the difference in between the model weights, which means that it's likely not to be a cat, but it's probably a dog. or a man in that case. And that's how the algorithm would annotate images and understand which is which.
That's how you can create, I don't know, an astronaut on a unicorn in space. It's something that's not supposed to be a thing, but it could be created through those annotations and knowledge.
These three are the main text-to-image models that have been advanced lately. which were the DALI models, the stable diffusion, one of which I actually ran locally when it came out on my computer. It was pretty fun.
You can still mess around with it, and it's free. And then mid-journey as well, which is the go-to kind of option, though now there's a lot better advances that I'll talk about.
And now we get to the fun part.
I'm sorry to bore you all, and I'd like to introduce you to my friend, Rabbit. Say hi. Hi, Rabbit.
Say hi to the crowd. Let's see if it works. Let's see what we've got here.
It has a camera built in. It has all these features that we talked about. And it can process images locally. So it has a small device inside of it which processes images.
I can probably say, yeah, like, hi, everyone. It is great to see such a life gathered for what looks to be an engaging event. Hope you all have a fantastic time.
At least the demo worked, thankfully.
It has several cool features. It has a bunch of little features. You have a dropdown menu.
You can use it to take pictures. You could also use it as a translator device. So if anybody here doesn't speak English, they can use that and speak it pretty fluently.
You can create your own applications. So you could say, create me a timer for 10 minutes, or anything much more difficult, like mini games. It also has small games.
And an intern, which is an AI agent that you can teach and program to do certain things. So they have a teach mode. You show it how to, for example, enter into a booking and book a room for your flight in Stockholm or whatnot. Once it looks at it, you could actually use it and say, now book a flight to Budapest and a hotel room. And it would do the same thing, but apply to Budapest.
It has like three to five seconds of inference. And that's the main idea behind this little thing.
So back to the finishing touches already of the presentation, since I'm running out of time, I'm assuming. Yes.
And the state of the AI today. There's a lot of implementations that you can actually have and that are being used by many companies nowadays.
AI is producing most of the ad campaigns for big companies. We've seen Coca-Cola videos generated by AI. We've seen big studios use AI.
Netflix itself has a... special parts that's dedicated to AI generated videos that people can make.
Or more recently, Sora 2, for example, that came out, which is a whole other thing. It's like having Instagram where anybody can generate a content of anything and post it on there. It's all AI generated. And it's pretty insane, to be honest.
There's also models like Google Nanobanana, which sounds really funny, but it's honestly really useful.
I used to do Photoshop back in the day, and it took me like a year to learn it. And now that AI does anything I could do in Photoshop in minutes, it would take me hours to do. Like shading, you can tell it to change anything in the image by highlighting or even just speaking about it.
Or you could also change how the image looks with you being consistent in it, which was a big problem at the start of the image generation era where with time the person's features would get distorted the more you generate the image or the more you edit the image and Now they've achieved to where you can take one image and then you can instantly Transform it into a different image with it being the same person which is kind of difficult to get but thank Google I guess
And there are many applications that you can have for AI and marketing and advertisements and both game development. You could also apply to mainly anything I can think of the top of my head, real estate. I've had consultation about that. There's a lot of professional applications, but you could use it for your work as well if you're a small company.
So any of you do have a preferred kind of application for AI images that you've tried before that you liked? I don't know. Anyone?
You have a marketing agency, don't you? How do you implement AI in your work?
Well, for not images, but like a cartel. I have one thing that is a burger. Flyers. offer for work, for how do you think?
for posters it's a good use case as well i've recently found an application called gen spark ai which you should write down it has a designer algorithm it's free for the first like five ten use cases but you can make basically one shot on posters with it you upload images you upload the text that you want and it creates pretty good realistic uh posters of whatever you want it to be i can actually show you after i finish with the presentation a couple of
uh demos that we've done for the project in barcelona because i used that as well for posters there and uh i think we're done here so thank you very much for listening ai is the brush but human intuition is still the artist let's hope it stays that way
And here's a QR code. Also, if you want to take a scan, what it basically has is a bunch of courses on AI that are free to access by big companies like MIT, Coursera, Google, some basic stuff. And I haven't really had time, but it's a Google Doc.
So I will add extra information to it, for example, tomorrow, and a couple of helpful videos as well.
And finally, just to show you a couple of posters, since I said I would about GenSpark, I have them right here, which I used for our veterinary, for our Kiko projects, which was also done Yeah, look at this.
This was generated in basically uploading one example image, the logo, which I also made with GenSpark in a couple minutes, and some sample text. It would fill out the rest and create pretty nice looking posters.
However, since it's AI, I do want you to be careful because it does sometimes make mistakes. It distorts text. And it can sometimes get things wrong by basically putting the same thing twice.
Although the color matching, the shadows, the images that are generated of the dog, because that's a generated image, they're pretty good and convincing for the level that it is at. And if we go back in history to the time machine to 2014, it was just basically writing numbers and finding out which piece of clothing is which. So for what we've come, it's a pretty long way, to be honest.
And I thank you all for listening.
because I don't know how many of you have seen it, but it's quite impressive. Yeah, Sora 2, okay. Also, because I don't know how many of you have realized, but that presentation, I assume, is being done with AI, isn't it?
Yes, this presentation was done with the Gamma AI. I also used Claude for the back, and I used a bunch of my own notes, as well as curated knowledge on the different classes that I gave. I give private classes, so I had a lot of stuff in my Notion, and I would compile all of that, process it with Cloud, build it up in a structured presentation, and then would generate a presentation based on that from Gamma, and then I'd fix it all up.
AI doesn't mean that you do a main note. It means that if you enrich the AI, it can give you a really good presentation. If you just said, give a presentation for this topic, it's not going to be great. Yeah, I mean, it's all about context in the end.
And this.
Yeah, the first video, it would be great, I think. Yeah? Just the two minutes? Yeah, sure, sure, sure.
Just to start. This is the official open AI presentation. Hope the sound works. Yeah, it does.
One year ago, Sora 1 redefined what was possible with moving images. Today, we're announcing the Sora app, powered by the all-new Sora 2. As far as I'm aware, all of the images and videos on this are AI-generated. Yes, in the same old mode. I'll pass the bill for more details.
Every video comes with sound. Sora 2 is also the state of the art for motion, physics IQ, and body mechanics, marking a giant leap forward in realism. And we're introducing Cameo, giving you the power to step into any world or scene and letting your friends cast you in theirs.
On the path to ATI, the games aren't just about productivity. It's about creating new possibilities. It's also about creativity and joy.
One, two, three, four. Rubber ducky race. Wow.
How far we've come, eh? In only a couple of years.
And there's a lot of interesting things to unpack about AI video generation. It's very similar to the diffusion models, but the process is frame by frame. So it takes a lot more energy, and I actually have
Quite a lot of material on that too, but I wouldn't have to have the time to put it in together as well.
And the lip sync is very similar to VO3, which is another video model released by Google, which was actually the first one to make it. And now they're working on similar models that are Chinese. So that would be open source and easy to access.
For now, Sora2 itself is like a social media network, and it's mostly being used for creating Family Guy videos and other fun videos, or Sam Altman stealing things, or people just having fun with it without worrying about copyright infringement, which let's see in a couple of months how that's going to go.
OK, we have five minutes for questions.
Sure.