Using AI in Learning Materials

Rus Slater FLPI

Published Feb 8, 2024

+ Follow

Observations from creating voice overs and closed captions for videos / podcasts

When you are using AI to generate voice overs from a text script.

AI seldom ‘knows’ how to pronounce brand names, e.g. ‘Vonage’ , pronounced 'vonidge', often comes out as ‘von-aidge’ or even ‘voan-aidge’. Check these carefully!
AI won’t know how to pronounce acronyms - is it C-P-A-A-S or see-pass? Sometimes you may need to write see-pass, sometimes you may need to write ‘communication platform as a service’.
Many AI products cannot give the voiceover any intonation or emotion. You may need to try several different voices / accents to get one that uses a tone that matches your text.
Most AI products cannot recognize the value of pauses, either between phrases or at the end of sentences or paragraphs. You may need to cut up you voice over to allow you to edit-in pauses later.
Long sentences with lots of conditional clauses are hard for humans to read, and listen to and understand. AI speaking them does NOT overcome this challenge. Before you try to create the voice over, use a Readability checker to improve your script.

Recommended by LinkedIn

How Artificial Intelligence is Transforming Podcasting

Simon Hodgkins 3 years ago

How I turned a 10-stage AI dubbing bottleneck into a…

Nishesh Kalakheti 9 months ago

AI Voice Cloning

Hassan Jukhadar - حسن جوخدار 🇸🇦 3 weeks ago

B. When you are using AI to add closed captions / a transcript from a voiceover.

FACTOID: I just discovered that ‘subtitles’ and ‘closed captions’ are not the same! Closed captions are a transcript of the voiceover. Subtitles are a translation of the voiceover into a different language to that which is on the audio! Everyday is a school day!

I’ve not worked with subtitles yet, but here are some observations about closed captions:

Many AI products won’t recognize lots of acronyms, especially where the phonic sounds are similar....e.g. VBC becomes BBC, CPaaS becomes ‘see pass’
If an acronym starts with an A or ends with an A, the AI often changes it to a lowercase ‘a’, and uses it as an indefinite article; so "VSA” becomes “VS a", or ADT becomes "a DT".
Some AI products seem. to put in random. full stops.
If the voice over isn’t like a 1950s BBC show, the AI will put in 'wanna' for 'want to, 'a too low' for H2O, and similar.
Phonetically similar sounding letters get mixed, so, for example, ‘fill’ becomes ‘pill’ or ‘bill’. ‘Live’ becomes ‘five’, ‘mine’ becomes ‘nine’ or 9.
Homophones can also get mixed. For example, ‘queue’ becomes ‘cue’, ‘due’ become ‘jew’ or ‘dew’.
AI cannot tell that 'Small', 'Medium' and 'Large' should be capitalized in certain circumstances. Or that Voice, Verify and Video may be product names that therefore should have a capital V.
If the voiceover is full of ums, errs, hesitations, rep-repetitions, long, rambling sentences that don’t seem to…….well, you know, like…… go anywhere, then the AI will faithfully write them out. Many people are horrified when they see a genuine transcript of what they actually said on a live meeting or a webinar. Speaking off-the-cuff is very seldom fluent or fluid.

Wherever possible a good, readability-checked, scripted voiceover is the best way to go, whether read by a real person or an AI generator.

Lucia Loggia 2y

SO neat!

Terry Bird 2y

Using the AI engine to generate the first pass, then getting the VTT file to correct it before loading that back, is an effective timesaving approach. This makes the AI part 'augmented' rather than artificial. As with all AI considerations, the AI/human combo is the optimal application.

Using AI in Learning Materials

Rus Slater FLPI

Observations from creating voice overs and closed captions for videos / podcasts

Recommended by LinkedIn

More articles by Rus Slater FLPI

Others also viewed

AI Dubbing Market Update

🎙️ OpenAI Audio & Transcription Models – Voice-First Interfaces for the Cloud Era 🔊🧾

Paper Review: Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Google Gemini AI Adds Audio Upload & Transcription Feature

NewMind AI Journal #249

Developing a Novel Audio-Native Retrieval-Augmented Generation System

Techie Tinkering: Audio meddling, transcription, summarisation.

Automating Audio Transcription with n8n and Whisper: A Quiet Revolution in Workflow Efficiency

Chat with Transcription Generated from Audio/Video

Creating Content with Voice-Activated Technology

AI-Powered Dictation Techniques

How to Use AI in Content Marketing and Copywriting

How to Use AI for Professional Video Production

Issues With Using Books for AI Training

Using Voice Assistants to Improve Writing Efficiency

Explore content categories

Observations from creating voice overs and closed captions for videos / podcasts

Recommended by LinkedIn

More articles by Rus Slater FLPI

Some useful thoughts for first-job seekers (and their parents/teachers/mates)

Yes....but this ain't the Matrix!

Intergenerational Food for Thought

You are never too young to start thinking about your retirement...

Twenty-First Century Collaboration

OMG. Do we really need another TLA?

Curating content for learning and development.

Lousy online meetings: Who is to blame?

Management by Wandering About

Performance support- a short guide to 12 tools available

Others also viewed

AI Dubbing Market Update

🎙️ OpenAI Audio & Transcription Models – Voice-First Interfaces for the Cloud Era 🔊🧾

Paper Review: Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Google Gemini AI Adds Audio Upload & Transcription Feature

NewMind AI Journal #249

Developing a Novel Audio-Native Retrieval-Augmented Generation System

Techie Tinkering: Audio meddling, transcription, summarisation.

Automating Audio Transcription with n8n and Whisper: A Quiet Revolution in Workflow Efficiency

Chat with Transcription Generated from Audio/Video

Similar topics

Creating Content with Voice-Activated Technology

AI-Powered Dictation Techniques

How to Use AI in Content Marketing and Copywriting

How to Use AI for Professional Video Production

Issues With Using Books for AI Training

Using Voice Assistants to Improve Writing Efficiency

Explore content categories