How to Use AI as Your Filmmaking Crew

Introduction

Hello, hello. I've never spoken in a church before. Don't judge me.

My name is Yalan. I was expecting you in a second, but this is kind of fun. I'll get to warm you up a little, too.

So before I introduce myself, I'm going to do a little bit more show of hands. Don't worry, I'm not going to get you to do a whole warm-up. But I'm guessing a lot of you have already used ChatGPT, but show of hands, how many of you have already used it?

Okay, as expected. And how many of you have used Mid Journey before?

Ooh, I like this. And how many of you have used Runway or Pika or Luma? Okay, that's higher than I thought.

Very cool.

AI in Filmmaking

All right, well, today I'm going to talk about AI as your filmmaking crew. And it's very much a use case process, and then we'll do a little bit of a demo.

If people have their laptops, you can follow along. If not, I'll kind of demo for you to kind of show you a little bit about how the process works.

Background and Motivation

But before, why am I talking to you about AI filmmaking? Because I studied economics as a major, as every international student who ever studied in the US does to get a job. But as a minor, I studied cinema studies because I was obsessed with cinema.

And as part of my studies, we focused on La Nouvelle Vague. So those of you who've seen Truffaut or Godard films, you're in good company today.

So my background is very much filmmaking and cinema. But then I quickly moved on to YouTube and social.

I joined YouTube in 2013 and I worked there for about seven years in the content partnerships team always. So I was advising essentially creators of all sizes, right?

From people in 100,000 subscribers to Joe Wicks, who knows Joe Wicks? Yeah, so I got him to do his YouTube channel after finding him on Instagram.

And this is his 100,000 button. Now he has a lot more subscribers than that in a burgeoning business.

And so I supported celebrities as well on the platform. 1The biggest thing I took away was really how different social is and how it enabled, or YouTube in general back in the day, enabled independent creators and filmmakers to kind of get their voice heard, right?

Then I quit YouTube and leveraging my influencer knowledge, I thought, hey, why don't we use influencers to grab consumer insights and market research?

So I set up my own company after leaving YouTube. I've been working on Multitude for the past two and a half years, focusing on consumer insights.

So we did insights through live streams on YouTube where essentially you were in a massive focus group with hundreds of thousands of other people in real time data. But now I'm obsessed with gen AI and creativity.

And the main reason is because I have suffered from not being able to create because how hard it is. How many of you are filmmakers? Are there any filmmakers in the audience?

Great, one person. I admire you, sir. It is extremely hard.

You are managing Sometimes, in some productions, hundreds of people, and it's sometimes really, really hard to get everything together, so I have huge respect for filmmakers, but as an independent creator, I've always wanted to do things, and I haven't been able to.

I have a couple of scripts at home that I haven't been able to change. 1So when Gen AI came about in the last year or so, year and a half or so, I really started experimenting with it and trying to figure out, okay, how does this allow me to do what I want to do?

So I made a couple of short films. This one in particular was selected by a shorts film festival. It didn't win anything, but you never know, maybe for the next one. And I will show you this film, it's really short, it's about 30 seconds long.

Use Cases of AI in Filmmaking

But before I get there, what are the use cases of AI in filmmaking? This is the practical talk after all.

So one is I see a huge benefit on visual film treatments and concepts. So in traditional film commissioning process, you have to write up a treatment, you have a storyboard, everything is in written format. It's pretty long. And you basically give that to somebody to say, will you please fund my film?

I believe that AI can allow that to become a lot more visual. So you can create in very low cost, like a demo, right? Like musicians will record a demo.

I see AI really being helpful to filmmakers and creating essentially a demo reel. So they can take and say, This is the trailer of the kind of vibe I'm going for with this film.

Second, B-roll shots. This is actually what I believe Adobe is aiming for in their implementation of Gen AI.

So if you want to add an aerial shot of Chicago, instead of filming something, you can quickly get AI to generate something for you. Extending shots, this is also coming soon in Adobe, I believe, in collaboration with Runway, because what Runway does really well, and you'll see that today, it takes an input image or a scene and then it can extend it pretty well, consistently, especially with Gen 3.

Upcoming what I see is Gen-AI visual effects. There's a couple of companies setting up specifically to do this to make it a lot easier.

And finally, because I come from social media and YouTube, virtual influencers. They're not new, they have always been around.

Have you heard of Hatsune Miku? Anyone? Yeah, ooh, ooh, some people.

Yeah, so Hatsune Miku is, I guess potentially the very first virtual influencer, she's a singer, people go to her concerts, she's not real and she's just a projection.

More and more of this will happen and as AI gets much better at simulating and showing human emotions and replicating human likeness, people will build parasocial relationships, but that's another talk.

So first, let me show you this 30-second film so you can see what's possible. Everything you see is generated by AI, where I'm controlling everything.

The more I search in the world around myself,

Thank you. No, we're not listening to that again. There we go.

So as you saw in the credits slowly, this was a poem that I wrote a million years ago. And I thought I needed something small to kind of create in AI.

And so I took that, I used ChatGPT first for the story section. This is kind of like my process. To take that idea into a story, I built that into a script.

And then, and there is a custom ChatGPT you can use that I've created that takes you through that whole thing. I'll share the link at the end.

Then the next step is really to create the visuals. And here you really go for more than a storyboard.

In traditional filmmaking you will do a storyboard, it can even be hand drawn and that's perfectly fine because it tells your DOP to frame the shots correctly and block everybody correctly. In AI filmmaking you want the final shots of the scenes, the ones that you will actually show to people.

So, this is the part that I spend the most amount of time, in all honesty, is in mid-journey, trying to create that perfect shot. So, think of it as like, you have a minute film and you have, let's say, 10 scenes, you need to have at least 10 shots that are perfect, that you're perfectly happy with, with your characters, blocking, environment, lighting, etc. so that you can animate that, because then that's it. You can't really go back and change afterwards. So the visuals are super, super important.

Then you animate each shot.

There's multiple companies right now, and pretty much everybody beats Sora to the punch with world building.

So there's Luma AI, which is actually quite good if you're trying to... Some of them are different for different purposes.

Runway is very good in Gen 3 for any kind of scene that involves humans. It's really much better at human characters, you'll see in a minute.

Luma does a really good job of beginning and ending, so you can put an image as your starting shot and your ending shot and it will generate the scenes in between or the animation in between. So it does some specific things differently.

Then once you have all the shots animated, you basically have your video clips, your shots for each scene. Then you add the sound over on top. So creating voiceovers, sound effects, music.

The voiceover that you heard in the previous film was done by Eleven Labs. I love 11 Labs but I also hate it because sometimes they're quite emotionless. I think they're working on that but there's no way to control the AI voice to sound excited in one part and sound depressed in another.

It's much better to use an actor if you're in a If you're doing a proper film, I wouldn't use Level Labs at the moment for that.

And then you finally edit it, right? In here I didn't use any AI. So I edited it old school way. I use CapCut because I find it really intuitive.

Practical Demonstration

So today I'm going to focus more in depth on the visuals and the motion. So on the mid-journey piece and then the runway piece specifically. So we're going to create a video together. It's going to be three shots created in Midjourney, and then we'll animate those shots in Runway.

You can follow along if you want, but you don't have to because I'm going to have it open. The typical mid-journey prompting, with AI, it's always about prompting, right? You need to know your prompt.

Mid-journey changed quite a bit from version 5 to version, right now we're in 6.1. In 6.1, you can use more natural language with mid-journey. Because previously, if you know, for those of you who use mid-journey, it was like meta-tagging a photo, right? With a lot of commas.

Now you can be a little bit more descriptive. But I will still recommend you follow this structure. So you start with your arc direction first.

and there will be prompts coming. The art direction decides the overall look and feel of that shot. Then you have your subject and action, and I would recommend keep it as short and sweet as possible.

The longer you make it, or the more long run-on sentences you create, it does tend to get confused. So it's better to keep the subject and action very focused. Then comes the details.

So here you can actually add details if you want, completely optional, on costume, location, time of day, lighting, mood, the camera that you want to mimic, the lens that you want to mimic, etc. And then finally your parameters. So what's the aspect ratio?

How much do you want to apply the default style? Midjourney has a very specific default style, which I don't use, but some people like it. We'll create a couple of these. I'll see if I can get this to attach. There we go.

Can you guys still hear me? Yeah? So I thought that the three shots we could do, since we're in mid-September now, is to go back to summer a little bit.

So this is just some suggested prompts that we can use together. So here I've color-coded them, so you can see the art direction is in green, the subject in action is followed in red, And then you've got the blue details, then you've got the parameters at the end.

If you can't create these shots, if you want to just use the runway bit, you can go to bit.ly slash AI dash shots to just download those images if you want. And then you can do runway on your mobile actually. And runway does have a free tier. So if you just want to sign up to runway on your mobile and just try to do this on your mobile, you can if you want.

So let me get out of here. And Midjourney is no longer in Discord. It actually operates on its own webpage, finally.

So I've been playing around with this. You can see the shots that I was generating. But it's become really easy.

So you essentially put whatever you want at the top with the art direction beginning. Then the parameters are now at the bottom here, so you can adjust the landscape. I find it easier than typing.

And I use the raw style instead of the standard. This is, for those of you who use mid-journey, how many of you use raw over, you use raw? So though many of you use Midjourney's default style aesthetic, it's quite specific and it will add things to it.

So try raw if you especially specifically want, if you have something in your mind's eye that's very specific and you don't want to meddle with it or you don't want Midjourney to add anything to it, just use the raw style. It will be quite different. And because it creates so quickly anyway, I recommend you try both raw and the standard model.

And then we hit go. Oh. Oh, you can't see the imagine bit. Today, let me shrink the window. Thanks for telling me. All right.

Okay, so these are the four options. For those of you who haven't used Midjourney, Midjourney will always give you four options. based on your input.

And this pretty much resembles what I asked for, right? I had asked for cinematic landscape photography, gentle waves crash onto a golden sandy beach in Northern California, warm cinematic lighting, sunset, scattered clouds. So it did it pretty well.

And what it will do is it will give you a low resolution image options for four of them. My internet's being slow. And then you get to decide which one you want to change or upscale.

You can actually inpaint in Midjourney, so if you wanted to add a boat in any of these shots, you could do that. But let's kind of choose one of these. I kind of like this one.

And then I should definitely upscale it so that you have the higher resolution image to actually provide into Runway. So let's upscale Subtle. There's two options, you can upscale Creative or Subtle.

Subtle will basically generate a very similar composition to this as much as possible in a higher resolution, because actually whenever you use Mid-Journey, it's generating new image, so it will always be slightly different. It's not really taking the same image. If you say Creative, it's going to add some new elements to it when it's upscaling,

If you're an OCD person like me and you like to control things, I would never use the creative one. I would always use the upscale subtle. The way I use the creative version is when I vary.

So if I like a shot, but I want to vary it slightly or have some other options, right? Like I quite like the, well, let's do it. Why not?

Let's do a strong vary. What it's gonna do is it's going to take that scene and composition and it's going to create similar versions to it with more strong changes instead of subtle. So you will see the results in a minute.

So this is why I spend most of my time when I'm using AI to create films, it's mainly in mid-journey, it's not really in anywhere else, just because it takes a while to get that perfect shot that you're after. So here, sometimes it flips them, It created different waves.

There's a stronger wave here. It's a bit more subtle wave over here. I kind of like the initial, actually.

Let's use this one. And we will upscale this one so that we have the higher resolution image that we'll download. And that's the part that we'll use in Runway.

And I think we'll do that next because I'm conscious of time.

So, in Runway, and how many of you had used it? Quite a few, right? I saw a couple of, cool.

So, I'll show you one thing before we get to Runway, because there's actually two versions of it. There's the Runway Gen 2, and there's the Runway Gen 3.

The differences between them is Gen 2 works in a very similar way to Bint Journey, so when you input an image into Runway Gen 2 and say, you know, clouds are moving across the sky, or the fire is burning, you know, it will essentially look at it as a 2D image, and then it will interpolate what the next pixel in a series of images should be, statistically speaking. They're statistical machines.

So that's why Gen 2 does really well with water, clouds, fire, things that are very well known in terms of how they move. The things that Gen 2 sucks at is human movement. If you've ever tried to get Gen 2 to get a human being move across something, they look like garbage because they're kind of morphing at all times. And the main reason of that is because it doesn't have full understanding of that scene. It doesn't have you know, the depth information, et cetera.

So if you remember when Sora demo came out, my mind was blown because it could do these very impressive shots of, you know, like a drone shot over a lighthouse, for instance. To be able to create something like that you need to create a world. So that's why the new models are called world builders. Sora is a world builder.

None of us can use it yet, but Gen 3 is a world builder as well. So what that means is if you input an image into Gen 3, it can assess the depth of things, the scene. So when you say, you know, tracking shot in, it can accurately actually move through that scene and eliminate the right elements that are closest to the camera. So it has a much better understanding of 3D space from that one image.

Advanced Techniques in AI Filmmaking

If you do text to video, it's almost even better, because it's creating the whole scene by itself. So I'll show you very quickly what that looks like.

This is a comparison of the source image on the left. Then you have Runway Gen 2, so that's the two-dimensional, I call them 2D. And then you have Luma, which is a competitor to Runway, which came out before Gen 3. And then I'll show you an example of Sora and Runway Gen 3 as well.

You'll see very different animation here. In the Runway Gen 2 ones, you'll see mainly the water moving. His hair is moving because, you know, hair sways in the wind.

It's quite well known to that algorithm. but the rest of it is quite limited, it's quite constrained. Luma, on the other hand, does this swivel camera motion, creating the remainder of that background, even though it wasn't in the image that I gave it, so it's quite impressive in what a world-building model can do.

And the next one compares Gen 3 to Sora. Sora, obviously, is taken from what they released, I couldn't generate those, but what I did is I took every single prompt that they had in the preview of Sora and put that into Runway Gen 3 to see how they would compare to each other. And it's quite impressive. It's impressively close.

So top part is Gen 3. Bottom part is OpenAI SOAR. This is from text to video. So it's just something like a woman walks in Tokyo with flashing lights. So you can see the tracking shots are very accurate. There is no morphing of human anatomy, which is quite impressive.

You will see one little morphing happen in a minute with the dogs. So look at this dog, it will disappear in a minute. Just disappeared.

But you know, it's pretty good compared to what, you could never do that in Runway Gen 2. There is no way you could do that.

All right, so let's actually go to Runway and show you how that works. In Runway, you basically go to Generative Video. The thing I like about these tools is their UI is click and play, so you don't need to be a coder or a developer to use any of these things. And then what you do is you essentially input your image.

Oh, I have to download it first, of course. All right, let's download this. All right. That's the one we want.

So I'm going to show you Gen 2 and Gen 3 so you can see the comparison between the two. This is Runway Gen 2. In Gen 2, I never ever put a text input. It's just the way I work with Gen 2.

Because in Gen 2, you have very specific camera control and a motion brush, which is actually quite cool. So you can go to camera control. You can say, I want this to be a zoom in, right? So then what you do is you increase this zoom.

Let's do two. Let's not go too crazy. You can do things like roll, you can do things like panning, left, right, up, down, you can do tilts. Tilts is where it gets a little funky.

And then the coolest thing is Magic Brush. In Magic Brush, you can select parts of the image and you can tell it to move right, to move left, to have motion within itself, which is called ambient noise. So for instance, let's get auto detect, because why not? Let's select the clouds and let's say, let's move the clouds to the right.

Gently, and let's add some ambient noise. Ambient noise means like motion within the clouds itself, because I want the clouds to kind of have this cloudy movement going on. And then let's get the waves, and let's say we want the waves to go in the opposite direction. Two. And let's do the trees as well. And with the trees, I'm just gonna add some ambient noise. Let's generate.

You can also use text. The reason I don't like using text is because you never know how the AI will interpret it. Sometimes it's happened to me where I've set static camera and the camera moves. So I'm never entirely certain how it's interpreting it. So in Gen 2, I like to use these specific tool brushes so that can kind of tell the AI what to do with it.

And then you'll see in Gen3, Gen3 is very different. My recommendation in Gen3 is to give as little text input as possible because it can infer quite a lot from the image and it can kind of decide what motion should be within that image. So we'll try both so you can see.

So this is the stuff that... Adobe is incorporating into Premiere Pro. So if those of you use Premiere Pro, Adobe will incorporate runway models. So you'll be able to pick specific scenes, extend them, using essentially the same models that you're seeing within runway, but they're integrating it within Adobe as well.

And Sora is supposed to be in Adobe Premiere as well later this year. Who's an Adobe Creative Suite user here? Okay.

I never got into Adobe myself. I found it very hard to use with Photoshop, so I kind of never picked up, and now I'm a CapCut guy. And if I'm late on time, just tell me, just give me a finger up.

There we go. So this is Gen 2 result based on our directions. So you can see the waves are crashing to the left.

The clouds are moving right. There's a little bit of movement in the trees. It did kind of exactly what I wanted it to do.

But there's always this like slow-mo effect in Gen 2. So if you've used Gen 2 in Runway, everything seems to be in slow motion for some reason. So I tend to speed it up in editing.

You don't have the same thing in Gen 3. So let me show you Gen 3, and actually Gen 3 is much faster. So we'll use the same shot, and then we'll say it's the first shot of the movement, and then we'll say here, handheld tracking shot on a beach. Just let's do five seconds for now. Yep.

So we can see the difference between the two models. It's a lot faster. All right. So you see it's a lot more fluid.

It's a lot more realistic. The tracking shot is a lot more what I would expect. It's a lot more what you would see in a film. So I think there's been a lot of YouTube data training on this.

Conclusion

But essentially, this is the process, right? So you create your shots, you then animate them in Runway or another tool, Luma, whichever works for you, try them, find the one that works for you. Runway seems to be the one that I use the most. And then you essentially add them all together.

So, takeaways. Pre-planning is everything.

So when it comes to filming with AI, you really have to have the shots, the visuals. Your storyboard needs to be the final shot by shot of what you want people to see because then that's what the AI models will animate.

So you kind of like reverse the process a little bit Image to video is your best friend.

As a filmmaker, I'd never use image or text to video. It can generate amazing things. It's just when you stitch them together, they may not make sense.

You may want to have the same style, for instance, of shots. You may want to have the same character within them, in which case you have to use something like Midjourney to create the shots and then start with an image and then animate that image afterwards. and experiment to get the best results.

As you just saw, sometimes Gen 2 might do something better than Gen 3. It's really up to you and the model that you choose for that specific shot.

I haven't talked about lip syncing, for instance, because we don't have enough time. But if you're lip syncing, Runway does have a lip syncing module you can use. Once you have a video, you input that video, and then you input the audio file, then it will make the character's mouth move in accordance with the speech.

There's other tools like Hedra, which seem to do a lot better, because it will actually make head movements as well, so it looks a lot more natural. Runway lip-synching is a little stunted, it looks a little robotic. Hedra, on the other hand, will move the head and will kind of create these emotions based on the script that you're giving to it.

So just experiment to get the best results.

So thank you. And thank you for your patience. Hope it was interesting.