From Idea to Video: Creative Workflows with Generative AI

Introduction: From Idea to Video with Generative AI

So hello everyone, so I'm going to present to you a simple idea here is from idea to video

creative workflows with generative AI.

So a picture speaks a thousand words so I'm just

going to show you two videos or less than one minute and then I will explain how I make them.

Two Approaches to AI Video: Text-to-Video vs Image-to-Video

first one and then second one okay so presentation so how do I make this kind

of video so here my image to video workflow why is it named like that

because with generative AI when it comes to videos is two ways of making them

with gene AI so the first of all the first of them is text to video so like

the regular way of using an AI to generate content or text you're writing

what you want for example I want a dog to do something on the video or some

stuff like that and the second way of doing it actually I buy this is the

thing I prefer because you have more creative control over what you're doing

is image to video and actually you're putting the image to act as a prompt

Sometimes you add some text, but the image in itself can be a prompt.

And overall, you have more creative control over what you're doing.

And for a creative person or a creative executioner, it's better.

Step 1 — Turning an Idea into Image Prompts with an LLM

So what I'm doing is that, first of all, as you can see, this is the logo of Claude.

But you can use any LLM.

You can use Gemini.

You can use GPT.

And what I'm doing is really simple.

It's just kind of difficult for people who are not used to LLM, but I just say to them,

OK, I have an idea.

I want to make a video with music in mind, precise music.

I have images that come in my head and I want to make a small musical clip.

so I need your help to do the prompt in Generative AI tool for images so I

use the help of Claude or Gemini to do the prompt in MeJourney so what

MeJourney is for people who don't know is a Generative AI for images so now you

can do videos nowadays but in MeJourney videos are not that good but

but images are really what stand apart.

Why Midjourney: Aesthetics, Community, and Variations

Why I'm using Medjourney, because it has a special,

I would say, aesthetics that you cannot find

on other text -to -image model.

For example, Nanobanana is really good.

Like it's really a good model to create images,

but it has less of a special aesthetics

and Medjourney is quite original.

For example, Medjourney is a community, so I can pick any images I found on the community,

so the other creators, and I can inspire myself from what the other people are doing and are making on the Medjourney community.

Also, what I can do is when I pick a video, I can actually rerun the same image.

I can rerun the same image, I can even edit the image right in my journey, I can

change, like you can see it's vary here, so I can make a small variation of the

image, just rerun the same prompt.

I can outzoom the image, which means that

there will be more elements in the next generation.

And the most important

things for what I'm showing you is here so it for example like I says is a

really special Majorini aesthetics and it's and you can actually personalize

the stylization so the major knee I will say filter in all your images so you can

decrease it but the more you increase this this this tool the more the image

image will seem original, like it will have a special taste to it that you cannot find

on other models for images.

Keeping Visual Consistency with Style References

1And the second most important stuff is style reference.

As I say, I create my prompts on the LLM of my choice, and then I go back and forth with

it, like I tell him, OK, your prompt was good, but I need to change this, this, this, and

this because this generation was not really what I intended and then from the moment I have an image

that I really like and I'll be like okay this is a scene and this is a shot that I will put on the

final form of the video I can take it and use it as a style reference which means that there will

be a consistency on all the other images and all the other scenes that I will choose for my

from my final video so here this was my journey and then the next step is so I'm

not gonna dive into detail about that but you can see the Photoshop logo so

for Photoshop what I'm doing is that I'm just doing the editing which means I'm

removing some stuff that I don't like about the images for example there could

be some hallucinations that I'm not a big fan of so I get rid of it with

Photoshop is you can use any editing software for this kind of task and then

I'm using another tool and define the big final one I would say it's Flora so

what Flora is and I'm gonna show you me recording my screen and show you this

platform so Flora is a platform when you can use any model that exists it could

be text image and video so any type of generative more ai model like that and you can actually uh

have everything in the same place uh mix some images together like you can you can do whatever

you want and you don't have to change uh a tool and uh to um to yeah to do too many changes

because everything is on the same platform and what i'm doing here for example i'm showing you

that on this platform you can actually do the editing so the in painting i'm choosing a zone

that i want to get rid of on the images here for example it's a snowy mountain top i've

chosen the nano banana so it will come i i say okay i remove the snowy mountain top i select

the zone i say i select nano banana pro in painting because it's the best model for this

kind of task.

Then it got rid of it.

Step 2 — Cleanup and Image Quality Improvements

Inpainting Fixes and Upscaling for Sharper Frames

After that I have my image and I can do the upscale part.

So what is the upscale?

The upscale is, so I'm using a generative AI tool which is named

Manific which is the best tool for that.

By upscaling I mean that it adds more definition

to the image because the problem with Gen -AI images is that sometimes there's a blur to it

and it lacks definition and it lacks details and with this kind of tool like it adds some details

I recorded my screen the quality is not that good but actually when you're when you when you're not

making that kind of steps when you're doing a video it's quite important because it really had

a definition and details to your shots, to your scene.

Step 3 — Generating Video Shots (Image-to-Video) with VEO3

And then I'm using the best model from my point of view

for video, which is VEO3, the model of Google,

to generate one of my scene.

So it's fast forward because it's quite long,

like it takes four minutes to generate eight seconds

of video, that's one of the flaws with generic video.

But then I have a shot that can be used

on my final final production and I didn't write any prompt but but for some

cases and for some images it's actually necessary to do to write a prompt so for

example here the image that you see is a statue and if I upload the image and say

okay there's no problem to just do a video with that what's gonna happen is

is that VEO3 is going to think that it's a character,

so it's going to move, maybe it's going to speak,

and I don't want that,

and the model cannot understand that it's a statue.

So for this case, for this kind of images,

When You Need a Prompt: Preventing Unwanted Motion or Behavior

I'm writing the prompt.

So what I'm doing is that I'm opening a window

where I use a Cloud Opus,

and I'm writing the video, the prompt for the video model.

Then I'm opening another window in order to have a prompt

that is ready to be used right away.

Then when it's done, so it's fast motion,

up, copy, paste it.

and then I open the window for the video step, I'm choosing VEO3 and then

I'm writing the prompt and then for this kind of images I have an image that is

ready to be used, a video that is ready to be used, a scene that is on point.

Step 4 — Assembling the Final Clip (Editing, Text, and Music)

So then, now that it's done, the next step is editing.

So I'm not gonna dive into

video editing, but what's happening is I just take all the scene together.

I'm not

quite using quite fond of video editing, so I'm just I'm just using I'm just

using I'm just using CapCut because it's a really easy video editing software.

I'm

and for the text you've seen, it's just regular design.

Like I'm choosing a font on the internet

that suits the aesthetics that I have on my videos.

So that was the MIM image to video workflow

in this entirety.

Costs, Credits, and the Reality of Trial-and-Error

So the cost of all that.

So I'm using, I have this subscription for LLM.

There's a subscription for Midjourney

and also important why I'm using Midjourney apart from like you've seen with Flora is that

Midjourney is really a sovereign model which means that Midjourney don't want this kind of

platform to use their model so you have to use these things apart.

This is the editing software,

I put like some parenthesis between because editing software like I'm using Photoshop but

is quite expensive and you can use any editing software like there's a tons of uh different

software that exists for that there's the canva not based web app so it's flora um i'm

it's roughly fifty dollar amounts you have fifty thousand uh fifty thousand credits on flora

what i'm saying why i'm saying that is because every generation cost credits which means that

for example, if you want to generate a text on Flora, it's 30 credits, and if you want to generate

a video, it's 1 ,500 credits.

Why Iteration Drives Cost (Especially for Video)

So what's happening is that if you, and that's really the important

part of my presentation, is that with GenAI, it's really trial and error, trial and error,

trial and error.

So sometimes you may not have what you want, even with the prompt that you

think is on point so you might have to rerun the same generation on the same prompt and what's

going to happen is that a regular scene of eight seconds is going to cost you like 4 000 credits

so it it goes pretty fast and that's why i say that way higher uh it's where the cost is way

higher if your videos are over one minute and a half or if you're doing like many videos like

every week or every month, the cost is not going to be the same but at the

moment that is the price I'm paying per month.

Alternative Platforms and Expanding Capabilities

To go beyond, so Flora is really an

amazing tool but it exists, there's tons of alternatives to Flora that exist

there's XFIL, Open Art, Artlist, LCX, Crea, Confuia, I did not test all of them

because what I want to do is to create stuff, is to create my videos so I'm not

testing each and every one of them.

I tested some of them.

Some of them have really cool

features compared to Flora, but it's just to tell you that there's really a lot of competitors in

that kind of field.

Example Workflow: Multi-Angle Shots Without Prompting (Xfield)

And for example, on Xfield, there's no problem that's being involved in this

process.

You just have to get on a website, you upload one of your images.

So this is a different

project, but this is an image I've uploaded on Xfield.

And then you get nine shots of different

angles of the same images just with Gen -AI and there is no prompt being

involved.

So just to show you the amount of I would say possibilities and the

amount of things you can do with this kind of web app and this kind of tools.

Conclusion: Give Creators Time to Learn, Then Ship Ideas

So my advice on this kind of matter and my conclusion is that there's really a

lot of trial and error and for example if you have people in your company or if

you work with people who are doing ads, who are doing creative stuff, who are doing videos

and who are using Gen -AI, you should give them some time to actually master the tools

because it's not even mastery, it's just getting your hands on that kind of stuff.

And so it takes some time, and you have to have time at your disposal, and my real conclusion

is that at the edge of Generative AI, you can actually do not anything you want,

but when it comes to creative projects, websites, anything, like it's really like

I'm not used to doing like videos, but now I can really create my own ideas and

if you have this kind of ideas, these kind of creative ideas in your mind, you can

actually do them now.

So thank you for your attention and now I'm open for

Q&A and Discussion

questions i have one first of all amazing stuff uh great videos i really love it the just for my

understanding the reason why you use photoshop because you could do theoretically the this

editing as well in flora are you using it because you want to spare some credits or is there

Why Use Photoshop When Flora/Midjourney Can Edit?

i didn't say it but basically it's more that it's just that photoshop isn't just

more precise.

There's more precision to it, so that's why I'm using Photoshop.

But for small modifications like on Flora, the NanoBanana Pro in

painting is quite good.

It's just that I want precision on some of my works, so

that's why I'm using Photoshop.

But as I say, there's tons of different editing

software for this kind of task.

Could you do the editing as well in

mid journey or why is the reason like when do you like i say my journey is really like it's really

good to generate stuff when it comes to editing it's not bad but it's not as precise as photoshop

and i'm used to photoshop like i've been using photoshop for four or five years now the generative

i feel when you can modify anything you want so i'm used to it but you can do stuff on my journey

But for the videos, Majorini is not that good, like it's really not a good model.

Production Pace and Budget: How Many Videos Can You Make?

So how many videos are you making with this budget of 80, 70 to 80, 80 to 90?

Well, this project is more of a recreational, so I don't have a pace with my production,

my content production at the moment I've made three four of them so it's kind of

new for me but I have plenty of ideas I have many things I want to do even

longer project I have I have even fun projects on on Ben's I really enjoy

video games stuff like that and will take more time but at the moment I've

just made like three to four videos but I'm used to GNI but more for like for

two and two to three years now I've been more used to static stuff like images

like carousel on Instagram stuff like that I'm more used to that like video is

quite new for me in in some sense

Professional Background and Real-World Client Use Cases

So your professional background is in this field?

Yes.

Originally I'm a communication officer and specialize in design and content, like writing

content and stuff like that.

And it's been approximately two to three years now that I have a gen AI consultant.

I act as a gen AI consultant in some companies, which means that, for example, there's companies

companies which is called Groupe Seb in France.

I've worked with them on very

precise stuff like for example there was when they're

releasing products they're doing you know pans, cooking robots, stuff like that

and actually there's recipes that come with it.

There's blog articles with recipes

and there was really all photos like there were like 10 years,

made 10 years ago and they didn't want to pay photographs,

they didn't want to pay text time for that.

They just want to keep the images and the recipes,

the photo recipes up to date.

So my role was basically to cut the cost and introduce Generative

AI to that and I've used my journey for that with what you've seen

with the style references, stuff like that.

So that's one of the example of things that I do on the professional field, many more of them, but that's a good example.

I like to give it because it's one of the best to give.

So you're really using it for your professional work?

How GenAI Is Changing Creative Roles (and Content Quality)

Yes, I don't come to a very different field, but it's just that when you're a creative person, especially in creative jobs,

like since I would say even like five to ten years ago and even prior to that you

you were specialized in something like you were used to images or you were a

video editing person or like there was really like specialized stuff and

role and now it's more I would say it's more shady like you can you can actually

actually go back and forth between the different wall, but if you're an editing person, and

for example, for me, if you're working on the, for example, in cinema, you're working

in movies, you're doing video editing stuff, now you can even do movies on yourself, basically.

This is a really unbelievable example of people who are doing movies now, like 10 to 15 minutes

episode of things entirely with generative AI.

Is it good or is it bad?

That's another subject because there's another content now and a lot of it is kind of slopped,

like there's no soul to it.

But anything is possible, like I said in my conclusion.

More things are becoming possible.

I'm really curious about, from the last given what Reggie was saying, is it possible

to share what your take was when you took a kick in it?

What, my background?

No, no, the…

You just asked last time.

Yeah.

Your opinion.

Ah, that's…

Awesome.

Yeah, it's really a different matter.

Like, I couldn't do an entire presentation on that.

It would be too long.

No, it was just like, I don't want to spoil because, but after my, after I'm done with my presentation, what's going to happen is that Reginald is going to ask you questions about a topic with AI, it's going to be a debate, like, kind of, and I just made an introversion, and it's what kind of a tunnel, so I was really deep into it, because I'm interested in AI as a whole and its implication,

So, that's how it happens basically, not diving too much detail, but that's what happened.

So, can I bring back?

Come back into it.

I guess I could ask you one question is what surprised me the most based on the conversation

we had was that I discovered what you were doing, because you're doing this also in your

spare time.

How, you were talking about the evolution of the job for creative, like I always want

you know it's really interesting because i'm asking myself the question like almost every day

almost every day like there's a new release like this cloud co -worker or anything that happens and

i'm like okay well well maybe i'm gonna be your cook because i like cooking so uh stuff like that

but at the same time, like we had a conversation about it,

is that you think things are going fast and then you're talking

to regular people and you're like,

okay, so they don't even know what GPT is, okay, fine.

So, this is the human world, this is social medias,

this is trends, this is professionals as well,

but one thing's for sure.

Reggie.

One thing for sure is that we can see that teams are being reduced,

like there's less juniors than before.

And I'm kind of still a junior a bit,

so I was concerned.

I was in that problem.

I was deep in that.

So it's really a complicated matter,

and I have different ways of seeing things, but it's an everyday questioning, almost.

One more question.

Time-to-Produce and Pricing vs Traditional Production

The first video that you showed, how much time did it take you to start from the first

idea and the first image or the first prompt on cloud to having this video?

because in here it looks like it's so seamless and it looks so easy but how

much time did you actually spend that's I didn't dive into that because that's

quite difficult to answer but for example okay

maybe you're marketing campaign specialist for example you want to do a

video for Facebook ads or Instagram stuff like that and you have a very

precise brief you know what you want to generate you know what the picture will

look like you know the colors you know everything and you and you already dive

into that kind of tools I would say that a short video like one minute can be

made in a day of work okay I would say a day and a half like it depends but if

your brief is really clear and the video is not that long because if it's three

five minutes this is not the same thing but one minute maybe but your brief has

to be super clear and sometimes like oh well I don't like the idea so I'm

changing my way when you're doing creative stuff for that that happens a

lot like compared to right now

Yeah, so imagine you made this video with AI, you would sell it for a price, and if you would have done it in the old ways, how much would have you sold for?

need like four different jobs to do that kind of thing.

You will need an animator, you will

need someone to draw, you need someone to do the sketch, the storyboard, all that stuff.

And so it will be way more expensive than what it is right now.

I don't even have the

pricing, but it will be way, way higher.

Definitely.

in the video world you price sometimes in seconds so you know 90 seconds so it could be

depending on when you add 3d elements and that there was a there was a time they used to tell me

that it could be 300 francs a second because of the amount of people because what's funny is that

like you know when you said it's eight hours to do this like for example it would take me eight

eight hours to figure out how to open Photoshop, put something in it, figure out the tools,

and do that one thing that you did in one second.

That's what I said for people who are used to the tools.

Yeah, exactly.

That's for people who are used to the tools because there's a lot of trying.

And like I said previously, like sometimes a video, Flow Hour, like I said, works on credits.

So 8 seconds is 8 ,500 credits, but the first generation,

sometimes it happens, but most of the time it's not on point,

like I need a second one.

So actually 3 ,000 credits.

So it's way cheaper than 10 to 15 years ago, but it's not free,

and the generation takes time.

Like you've seen, it's almost 5 to 10 minutes to generate stuff,

and if you want to generate an entire movie,

with this it's another matter yeah but that's what i say previously is that now there's more

content than ever but good content that's another matter because there's like i said there's a lot

of uh slop like ai slop like you're you're you're scrolling and you you dive into a video and it's

this is just a dog fighting a cat and like what's the what's the purpose of it like it's funny

like it's funny but what's the what's the intent what's the goal what I'm doing

musical clip with with music I enjoy and that touch me touches me and I have even

things planned like I I really really love metal and I have even planning on

2d you doing video clip of metal music stuff like that because I really enjoy

that and it touches me but there's some content you dive into it and you're like

okay so it's just a cat fighting a dog and that's all.

And like it's funny but

that's all.

So there's never been that much content but good content in other

manner.

There's also another challenge is that what he said about he was specific about

something he wanted to edit in one of his images.

Depending on who you work for, if you work in an agency

environment the bigger the brand the more demanding the coin so even if they

have an understanding that oh we think it should be cheaper because you're

using AI the person in the communication department or in the brand management

brand strategy department or in wherever levels because they're levels right

there's going to look at a video and then have expectations that if you've

ever tried whoever tried to generate an image on charge of it okay whoever tried

them to edit one thing.

One thing on that image.

The last time I was so frustrated by

something it was Spotify free.

That was the one time I looked at something and I was like,

why did you change the whole concept?

I said take out the S.

And I tried for a milestone.

For example, I was doing these posts for LinkedIn to find speakers.

And it was great.

I was

I was just like, just put the logo in.

And I put the logo and the, you know, attachment.

That level of frustration is what people don't understand

first of all they're paying for with professionals.

And even when they understand it,

the person who's paying it is rarely the one

that's actually working with the professional.

So I think the notion, I think the hardest thing,

and that's the reason I wanted your talk,

is that for me the hardest topic about AI

is how we value things.

And having been in service for 20 years,

I've been selling services as a marketing manager for years.

Most non -designers, non -developers, non -relating with service providers, they don't value service.

In the sense that they value that they're not doing it.

They value that they're paying for an output.

But I'll be honest, for me there's very little difference between when you go to McDonald's,

you're on the screen, you choose something, you pay for it, and you wait at the counter.

it's sometimes the same thing because the expectation is it should be easy

now and I think that's something that's a real debate and culturally I think

even in Geneva it's an even bigger debate it's very hard to justify your

expertise and get someone to pay for it if you're not like oh but I work for

Nike and I work for UBS so that's also why I wanted not just your opinion but

also for you to show you do new spirits aren't because artistically he's not

valuing this if unless someone says oh I want to buy it and I want the full

version he's doing it the passion like the the variances I do it in my own time

unless someone commissioned him to do it different content this is the best of

both world because at the same time it it allows me to keep like keep up to

date about what's happening on the genii fields when I do stuff like that so it's

It's not innocent in some way.

I'm doing it because it serves me.

But at the end of the day, at the beginning, I'm doing it because I have an idea.

I have an idea that comes and I want to release it.

I want to release it.

I want to launch it.

So that's the main reason.

Just one last question, I think, back there.

You wanted to ask something.

Prompting Best Practices: Anchorage, Dynamism, Cinematics

I was curious about why are you using Cloud Opus?

It seems to me that the context was a small because there are -

Okay, so I didn't dive into that because basically there's a structure

that is really good for video generation model.

So it's in three words.

I don't know how to say it in English.

but it's anchorage, dynamique, cinématique.

Anchorage is what's in the picture,

dynamism is what's happening, like what's the movement,

and cinematic is the kind of, like the camera,

is there some blurs to it, is it photorealistic,

stuff like that, and that's the best structure

at the moment for GNI Video Model.

I didn't dive into that, but I learned it on myself,

like on YouTube, okay, what's the best way to do image to video, and then, like, I learned a lot of things, like, on myself, like, there's no courses, or just YouTube, and trying, trying, trying, trying, trial and error, until I get something I want.

Wrap-Up

Everyone, a big round of applause for Luca.

Finished reading?