Using AI in Learning Materials
Photo credit: ORION_production on Envato Elements

Using AI in Learning Materials

Observations from creating voice overs and closed captions for videos / podcasts

  1. When you are using AI to generate voice overs from a text script.

  1. AI seldom ‘knows’ how to pronounce brand names,  e.g. ‘Vonage’ , pronounced 'vonidge', often comes out as ‘von-aidge’ or even ‘voan-aidge’.  Check these carefully!
  2. AI won’t know how to pronounce acronyms - is it C-P-A-A-S or see-pass? Sometimes you may need to write see-pass, sometimes you may need to write ‘communication platform as a service’.
  3. Many AI products cannot give the voiceover any intonation or emotion.  You may need to try several different voices / accents to get one that uses a tone that matches your text.
  4. Most AI products cannot recognize the value of pauses, either between phrases or at the end of sentences or paragraphs.  You may need to cut up you voice over to allow you to edit-in pauses later.
  5. Long sentences with lots of conditional clauses are hard for humans to read, and listen to and understand. AI speaking them does NOT overcome this challenge.  Before you try to create the voice over, use a Readability checker to improve your script. 

B. When you are using AI to add closed captions / a transcript from a voiceover. 

FACTOID: I just discovered that ‘subtitles’ and ‘closed captions’ are not the same!  Closed captions are a transcript of the voiceover.  Subtitles are a translation of the voiceover into a different language to that which is on the audio!  Everyday is a school day!

I’ve not worked with subtitles yet, but here are some observations about closed captions:

  1. Many AI products won’t recognize lots of acronyms, especially where the phonic sounds are similar....e.g. VBC becomes BBC, CPaaS becomes ‘see pass’
  2. If an acronym starts with an A or ends with an A, the AI often changes it to a lowercase ‘a’, and uses it as an indefinite article; so "VSA” becomes “VS a", or ADT becomes "a DT".
  3. Some AI products seem.     to put in random.      full stops.
  4. If the voice over isn’t like a 1950s BBC show, the AI will put in 'wanna' for 'want to, 'a too low' for H2O, and similar.
  5. Phonetically similar sounding letters get mixed, so, for example, ‘fill’ becomes ‘pill’ or ‘bill’. ‘Live’ becomes ‘five’, ‘mine’ becomes ‘nine’ or 9.
  6. Homophones can also get mixed.  For example, ‘queue’ becomes ‘cue’, ‘due’ become ‘jew’ or ‘dew’. 
  7. AI cannot tell that 'Small',  'Medium' and 'Large' should be capitalized in certain circumstances. Or that Voice, Verify and Video may be product names that therefore should have a capital V.
  8. If the voiceover is full of ums, errs, hesitations, rep-repetitions, long, rambling sentences that don’t seem to…….well, you know, like…… go anywhere, then the AI will faithfully write them out.  Many people are horrified when they see a genuine transcript of what they actually said on a live meeting or a webinar.  Speaking off-the-cuff is very seldom fluent or fluid.  

Wherever possible a good, readability-checked, scripted voiceover is the best way to go, whether read by a real person or an AI generator. 

Using the AI engine to generate the first pass, then getting the VTT file to correct it before loading that back, is an effective timesaving approach. This makes the AI part 'augmented' rather than artificial. As with all AI considerations, the AI/human combo is the optimal application.

To view or add a comment, sign in

More articles by Rus Slater FLPI

Others also viewed

Explore content categories