Generated Video Redux

Generated Video Redux

Earlier in this series, I didn’t have a great experience creating videos just using a prompt. The model, shall we say, struggled to get the right number of people in the prompt and a variety of other issues. I felt that either the model or I wasn’t quite ready for prime time. However, that might have changed.

I am currently writing a book about how to live with AI. The great thing about writing the book is that I have immersed myself in AI, learning how to write prompts and getting a feel for the “Bigs”: Gemini, ChatGPT and Claude. Which brings me to the subject of this latest installment of “The Bumbling Prompt Engineer”.

Naturally, there’s a chapter about writing prompts, and I have had to dip my toes into video. This isn’t something I was eager to do. Bad experiences tend to linger, and I try to avoid them whenever I can. Not this time; the chapter outline included generating a video from a prompt and from a prompt with a reference image. 

Though there are many applications out there that will create a video, I decided to stick with Gemini, ChatGPT’s current version of Sora and Adobe’s Firefly. Anthropic’s Claude didn’t make the hit parade because it is not ready for prime time. 

I have to admit, I approached Sora with a high degree of hesitation based on the deluge of AI Slop created by Sora that is out there. Apart from the Deepfakes of Karl Marx out shopping and the current Talking Babies craze clogging my Facebook feed, just opening the Home Page for Sora is all you need to know. 

Various videos on the Sora homepage.
The Sora homepage is not for the faint of heart.

I am also not exactly a huge fan of Adobe’s Firefly. Adobe has been wrapping itself in the AI banner- All hail, Sensei- for years, but looking at Firefly and the considerable number of models one can choose from, one can’t help but wonder if innovation is taking a back seat to licensing when it comes to AI at Adobe. Adobe has also been heavily promoting the fact that Firefly uses Google’s Nano Bannana for images and Veo 3.1 for video. Thus, the inclusion of Gemini and the opportunity to see how Firefly and Gemini compare.

The first prompt was the usual brief version of what I wanted. The problem with brief versions is you rarely, if ever, get what you are expecting. To be honest, I was pretty surprised by the results. The prompt was:

Create a 10-second cinematic video of a meadow in the Swiss Alps.        

Both Firefly and Gemini provided an 8-second video simply because that is the hard stop duration with Veo 3.1. What really surprised me was Gemini adding an unwanted audio track featuring wind blowing and birds chirping. There were other issues that bothered me. Near the end of the video, the sun magically sets, and the scene shifts from afternoon to dusk.

Firefly was actually better than Gemini. The scene started with the camera flying through meadow flowers, then rising to reveal the meadow and the valley it leads down to. Best of all, no soundtrack.

Sora caught me by surprise. You need to write a storyboard. It doesn’t accept a simple prompt. The first storyboard was a request for a 5-second video of a Swiss meadow and mountains in morning light. The second storyboard requested a 5-second video of the camera gliding through wildflowers and rising to reveal the valley between the mountains. The result was the best of the lot. Tossing in the cow was a pleasant surprise, whereas the audio track wasn’t requested.

Create a video from a photo.

One of the really neat things you can do with video is to provide an image and then ask the model to turn it into a video. What I’ve discovered is that you don’t just toss an image into a model and expect something that will be considered for an Academy Award. Instead, you need to access your internal Hollywood screenwriter. In my case, I don’t have one, so I turned to Anthropic’s Claude for prompts I could use.

I clearly needed a photograph. For several years, I lectured at various universities across China, and during one of my visits to the Central Academy of Fine Arts in Beijing, I was asked if I would be interested in hiking to the Great Wall. The image below is one of my favourite images from that experience. It is a young man, hiking through farmland in a valley, following a path that leads to the Great Wall.

Article content
The photo to be turned into a video.

If you are as new to this game as I am, I have discovered that the Generative models will either help you write a prompt, improve your prompt or suggest one. In this case, I uploaded the image to Claude and asked,

Could you please supply me with three prompts that would create a video of the young man in the supplied image hiking toward the Great Wall.        

I decided to go with the second prompt, which Claude categorized as a Dynamic Tracking Shot. It was:

Smooth tracking shot following a hiker with a messenger bag as he walks up a rural mountain trail toward the Great Wall. Vibrant green foliage borders the path, mountains frame the scene on both sides, and the historic stone fortification stretches across the distant ridge. Natural summer lighting, steady camera movement, travel documentary aesthetic.        

For this project, I decided to skip Veo and use Sora, Firefly and the openart.ai video generator.

Openart is a commercial product that accesses a variety of video generation tools. The one I wanted to try was the Kling 2.5 generator, which my colleague, Tomasz Dylik, uses. This also raises an important issue when generating video. It isn’t cheap because the process is “compute-heavy,” which explains why you have to purchase credits. 

I started with Sora, uploaded the image, and was told, “For safety, we don’t create videos from images that include people.” OK. So I fed in the prompt just to see what I would get. Not exactly what I was expecting, but close enough. What I really didn’t expect was Sora throwing in the following narration: “The trail runs higher. Every step revealing a little more of the Wall’s sweep across the mountains. Centuries of stone, laid to the rhythm of this terrain, now framed by summer green and the hush of wind in the leaves.”

With Sora out of the picture, the task was handed over to Adobe’s Firefly. To say I was impressed with the result would be an understatement. My only issue was that the video ends with the path cut off by a boulder, leaving me wondering where the young man goes from here.

Openart’s generation using the Kling generator was the most impressive. The UI, shown below, was dead simple to use once I had purchased 4,000 credits for $14 (Canadian). Not only can you ignore throwing in an audio track, but you are also told how many credits are remaining in your account, which is handy to know. 

The Openart.ai User Interface showing the tools, the prompt area, the final video and how many credits remain.
The openart UI is quite intuitive and contains a number of models to choose from.

Why was Kling better than Firefly/Veo? It was the last frame of the video that caught my attention. The path continued up towards the Great Wall, and the camera followed the young man along it.

Conclusion:

The bottom line is that video generation keeps getting better every day. They can create a video from short prompts and provide acceptable results. Please know what you are expecting. If the result isn't what you expect, the models will help you get there by suggesting a better version.

The ability to turn still images into video is impressive, and as I discovered, the more detailed the prompt, the better the result. Along the way, I found you can actually ask a Generative AI model to suggest a prompt when your "Inner Hollywood" eludes you.

This also reinforces something I would tell my students when it comes to producing creative works. I used a variety of Generative applications, and I have no opinion on which one is better or which one I used or the prompts I used, because:

Nobody cares how you did it. They just care that you did it.



To view or add a comment, sign in

More articles by Tom Green

  • Training the AI Dragon

    As I progress through this series, I am concluding that learning to prompt is no different from how Hiccup in the movie…

    1 Comment
  • Has LinkedIn Learning Abandoned Learning?

    In May 2024, LinkedInLearning released my course, The User Experience of Motion (for Non Designers). To explain the…

    25 Comments
  • The (quiet) demise of LinkedIn Learning

    I started my relationship with LinkedIn Learning back in 2007 through Lynda.com.

    18 Comments
  • One AI Does Not Rule Them All

    I have been involved in this silly business since the emergence of desktop publishing in the 80’s. I have seen the rise…

  • The Soul Of The New Machine Is Not Mine

    One of my more cherished possessions is a writing sample from when I was in Grade 5, where I used a fountain pen for a…

  • Figma Make and CROFTC For The Win

    In the previous installment of the Bumbling Prompter, I experimented with Figma’s AI tool—with mixed results—to craft…

  • Wrestling Figma's AI To The Ground

    As I discovered in the previous installment, moving the generated result from UX Pilot into Figma would cause a…

  • UX Pilot's Crash Landing

    As a LinkedIn Learning author, I have been doing a lot of work in the UX field. One thing I have discovered is that the…

    2 Comments
  • Firefly's Facial Hair Obsession

    Over the years, I’ve done a lot of work with Adobe and made several friends who still work there. Recently, they’ve all…

    3 Comments
  • Avatar Me!

    Many of my friends have these cartoons, which they call avatars, gracing their bios on LinkedIn, Facebook, and…

    5 Comments

Others also viewed

Explore content categories