The Many Modes of Multimodal AI
You’ve probably heard the expression, 'there are no bad ideas'.
It’s a way to encourage people to stretch beyond their comfort zones and say ridiculous, off the top of their head thoughts in the hopes that it could spark something original.
At least, that's how it's supposed to work in theory.
In practice? I’m not so sure.
I've heard lots and lots and lots of stinkers. Ideas that just miss the mark.
Case in point. Meta announced a new social feed called ‘Vibes’. It's an all AI-generated video stream where you can watch, like, create, remix, comment on and share AI videos. And while the feed has an almost hypnotic quality when you first tune it, it soon becomes an endless barrage of AI slop. Do you want to spend that much time in the uncanny valley channel? Not my vibe ...
Then there’s OpenAI’s newest product, Pulse, a better thought out concept than Vibes. Pulse is a combination customized news, to do list, ideas notebook, product recos and daily organizer, all rolled up into a series of colorful cards. Your Pulse turns up every morning, personalized to your day. Right now, Pulse is ad-free. But it's not a big leap to see how it could turn into a paid channel, where brands ante up for contextual, sponsored posts. Maybe I'm old fashioned, but I still want agency over my day, not an AI agent managing it, thank you very much.
These are just two of the newest AI tools built using multimodal AI.
What Is Multimodal AI?
Multimodal AI tools are able to process more than one type of data. In the early days of chatbots (i.e. circa 2023), generative AI could 'understand' either text or visuals separately, but not both at once. So you had ChatGPT (text) and Dall-E (images) as two separate apps. They each worked in a different way.
Today, many AI systems, like ChatGPT, Gemini, Claude and Grok are able to grasp the relationship between images and text. They can also make sense of video, spreadsheets, computer code, audio, design, music, languages ...
When I first learned about multimodal AI, I wondered if this might be a step on the path to artificial general intelligence (AGI).
Because combining words and visuals in novel ways is how we learn, understand, create and communicate. And AI would now be able to take in the world in much the same way that we do.
Would that lead to subjective experiences for machines, where they recognize objects or make inferences instinctively and not from their training data?
If machines can do that as well as or better than most people, where does that leave creatives like you and me?
Still At the Starting Gate
Right now, we're in the early stages of multimodal AI. And when you see what it can produce, it almost feels like magic.
Want a social media video? Try Sora, Invideo or RunwayML. Enter a prompt or paste your script, sprinkle in some visuals and you're on you way.
Got an urge to write a song? Explain your concept and style to Suno or Udio and in minutes, you have a fully-orchestrated tune.
Need your voice translated to a language you don't speak? Eleven Labs can help you out.
What about a landing page or website? Just share your goal and brief with WordPress or Wix and watch it appear before your eyes.
Other tools help you assemble, personalize and target ads automatically.
Or read, summarize and spot patterns in the many open tabs you have on your browser. Dia and Comet, Perplexity's new browser, let you do that right now.
A Creative Department of One?
With multimodal AI, you can take any idea in your head and easily bring it to life. The process is so quick, you might not allot any time to reflect on the substance of your idea or its viability.
Which means you're placing a higher value on speed than quality.
On the positive side, if you're a digital marketing or comms professional, your talents just grew exponentially. You have access to all to tools you need to perform tasks that required teams of people to complete.
And because most marketing and comms outputs tend to be 'good enough', you may feel multimodal AI tools are all you need to get the job done.
Yet, the danger with this approach is falling deeper into the trough of marketing mediocrity.
Recommended by LinkedIn
And you have to ask yourself whether you have the will and self-awareness to avoid that.
Remember the days of desktop publishing? Sure, anyone could design a page. But most of them were barely competent since the people using the tools had little or no graphic sense.
We had tools we could use to make an approximation of a great design. A somewhat reasonable facsimile. But the finished product lacked the nuance and sparkle of the real thing.
Standout content needs more than competence.
Multimodal AI Tools Can't Replace Vision and Talent
You need original ideas, artistry and craft and the ability to put it all together in a surprising and memorable way.
Subject matter expertise doesn't happen overnight. Aesthetics, experience and judgment play a big role, too.
Instead of thinking of multimodal AI tools as replacements, put them into the hands of video makers, writers, translators, and other artists and use them to inspire your already talented team.
And Watch for What's New
Multimodal tools keep improving at a rapid pace, so you'll want to pay close attention to what's new.
In this week's Digital Marketing Trends video, I look at some of the announcements from Google's recent developer conference.
I talk about enhancements to Google's text-to-video generator, Veo 3, that let you add sound, music and dialogue with your prompt. Then there's Flow, a suite of filmmaking tools that uses AI to enhance video editing and post-production. And Google's Asset Studio lets you upload brand visuals and prompt it to create new images, turn pictures into videos and optimize the creation of your ads.
Check it out and let me know what you think.
AI Masterclass
Before I go, I wanted to mention a new Masterclass on 'Harnessing Generative AI' that I'll be leading for the Institute for Public Relations.
It's a three-part series with other sessions from Samantha Stark and Paul Gennaro.
Interested? Here's some more info and registration details.
Follow Me on LinkedIn
That's about all the modalities I have to talk about in issue #130.
Thank you to all of you who follow me and subscribe, read, comment and share this newsletter!
This newsletter comes out twice a month. But between issues, I share shorter daily posts with my take on digital marketing and the latest on generative AI. It's another way to stay on top of the trends.
Let me know if you have questions about any of the videos in Digital Marketing Trends or any of my other LinkedIn Learning courses.
You can also visit my website and send a message or a question.
And while you're at it, follow the Future of Marketing Institute, too.
How do you use multimodal AI tools? Which is more important to you: speed or the quality of the content you and AI produce? Please share your thoughts in the comments below.
See you next time!
Note: All the content in this post was written by a human—me and not Martin-bot.
Loved this. Especially the “Creative Department of One” idea. Multimodal AI gives everyone tools, but not taste. Positionless Marketing is about knowing when to use the machine, when to trust your instincts, and how to connect the two with intent.
Totally agree. Talent sets the taste, AI scales the craft. Barie.ai, runs multimodal across text, web, data, and code with cited outputs and the most important it never hallucinate. Try it out: https://app.barie.ai/signup
What stood out to me is how multimodal AI shifts the focus from doing more to creating better. The real power is not about replacing creative teams, but amplifying their strengths. Imagine a designer who does not spend hours resizing assets but instead experiments with bold concepts because AI handles the repetitive tasks. That is where I see the magic happening.
Multimodal AI's ability to combine text, images and other data streams could unlock new creative possibilities for marketers. What do you see as the biggest hurdle for teams adopting these tools over the next few years?
Multimodal AI boosts capabilities but often leads to sophisticated mediocrity without skilled creatives guiding it. Smaller teams risk lowering quality when relying solely on AI. Integrate AI with expert oversight to maintain strong creative output and avoid diluted results.