Solo Podcasting Workflow: From Voice Dictation to Final Script

In this episode, I share the system I use at StereoForest to bridge the gap between writing and speaking for your solo podcast. And I‘ll help you understand the “modality mismatch” concept and why it’s important for scripting your episodes.

What you learn in this episode:

  1. Why traditional writing styles create a disconnect with podcast listeners
  2. The science behind “modality mismatch” and how the brain processes spoken text
  3. How vocal dynamics and variety directly influence perceived authority
  4. The three-step system to write scripts that sound natural/human

I cover a workflow you can use immediately for forming your solo scripts that includes dictation, signposting (discourse markers) and performer formatting and why they help your recordings.

Resources mentioned:

Vocal dynamics:

https://www.frontiersin.org/journals/communication/articles/10.3389/fcomm.2020.611555/full

https://pmc.ncbi.nlm.nih.gov/articles/PMC4765198/

https://www.gsb.stanford.edu/insights/big-data-approach-public-speaking

https://pmc.ncbi.nlm.nih.gov/articles/PMC6662577/

Voice to text (no affiliation with any of em – I use MacWhisper, Things, and Notion):

Mac Whisper: https://goodsnooze.gumroad.com/l/macwhisper

Whisper Notes: https://whispernotes.app/

Flow: https://wisprflow.ai/

Google Keep: https://keep.google.com/

Notion: https://www.notion.com/

Things (my fave to do app): https://culturedcode.com/things/

Otter AI: https://otter.ai/

Granola AI: https://www.granola.ai/

StereoForest newsletter: https://stereoforest.com/subscribe

Chapters:

00:00 The problem with sounding like a bot

01:53 The science of monotonous delivery

03:55 Step 1: Dictate your notes

05:12 Understanding modality mismatch

08:38 Step 2: Add signposts for the listener

11:24 Step 3: Format the script for performance

13:03 Visual example of script formatting

15:02 Summary and next steps

==========================

About and Support

==========================

Written, edited, and hosted by Jen deHaan.

Find this show on YouTube at https://youtube.com/@jdehaan

Website at https://stereoforest.com/lab

Get StereoForest’s newsletter for podcasting resources at https://stereoforest.com/newsletter

Produced by StereoForest https://stereoforest.com

Contact Jen at https://jendehaan.com

==========================

Support

Your support will help this show continue. Funds will go towards hosting and music licensing for this show and others on StereoForest. This show is produced by an independent HUMAN artist directly affected by the state of the industry. StereoForest does not have any funding or additional support.

Support the Show

  1. Like this episode or show and want more? Support us with a one-time tip: https://StereoForest.com/tips
  2. We love our podcast host Capitvate.fm! Contact to ask me anything, anytime. You can support the shows by signing up with Captivate here: https://www.captivate.fm/signup?ref=yzjiytz
  3. We have our newsletters on Kit.com. We also have our tip form with them, and sell products on their platform. Easy, and they don't take a cut! Check Kit out and support the show using this: https://partners.kit.com/ijdkivtf8ndd
  4. Transcriptions by MacWhisper. I use and love the Pro version (subscription free!) - you can get it too using this link: https://gumroad.com/a/20303251/ivpqk
  5. Schedule posts? We use Metricool (reasonable for multiple accounts/brands/shows). Support us using our link: https://f.mtr.cool/VZBOZR
  6. Support the show and get creative templates and assets: https://share.uppbeat.io/p4od8inwhc2j


==========================

About Jen

Host: Jen deHaan is the founder of StereoForest. With a background of over 20 years in tech, education, & instructional design and 10 years in improv and performance, Jen brings systems and scientific approach to media production.

Jen's website: https://jendehaan.com

This podcast is a StereoForest production. Made and produced in British Columbia, Canada.

Transcript

WEBVTT

::

so you've written a script for your podcast and you write that script like you always do

::

you record it and you listen to it back and you just don't recognize what you hear you

::

are hearing some kind of like robotic sounding playback maybe like someone reading a textbook

::

and if that kind of thing happens that's how an episode can fall flat because it sounds like an

::

essay being read aloud and doing that kind of thing which is actually what you're probably doing

::

it can really break the connection or even break the trust that you might have established or

::

you're beginning to establish with your audience. Or it might cause your audience to just move on

::

to the next recommended video on YouTube or the next podcast episode in their queue. The way that

::

you use your voice is really important when you record your podcast episodes. But this has a lot

::

to do with the podcast scripting process as well. Welcome to the Podcast Performance Lab. I'm your

::

host, Jen deHaan, and in this show, we take the most effective tools from unscripted improvised

::

performance and behavioral psychology and apply those things directly to your video and audio

::

content. Now, when you're trained to write, you're probably kind of trained to sound like an expert,

::

or that's what your industry generally does, but you probably haven't been trained to write in the

::

specific way that human brains can best listen or comprehend the things that you're saying.

::

And they're really looking for just that really natural way that people talk in real life when

::

they go and listen to a podcast episode. So first of all, we're going to look at the science of

::

why this even happens, because understanding why this kind of cognitive mismatch occurs is really

::

important to understand so you can figure out the best fix for you and why any of this is even going

::

to help you at all. So a listener's brain is actively assessing your state and the intentions

::

that you might have through vocal cues that you're making.

::

And research has shown that a monotonous flat voice,

::

which is the one that kind of lacks variety and musicality and tone and expressions,

::

the ups and the downs and the volume changes or the cadence changes,

::

human perceive that kind of delivery as like a deficiency in communication.

::

Most humans usually don't think of a monotonous voice as like a stylistic choice, unless maybe

::

you're acting or doing some kind of character. Because those dynamics, those ups and downs and

::

everything, they help people learn and they help people understand you, among many other things.

::

So I'll link to some of that research about monotonous voice and perception of it in the

::

show notes and the description. So the robotic or that really artificial quality reading a script,

::

especially if you're new to reading scripts, that can cause a disengagement in your listener because

::

it goes against the brain's expectation of that sort of natural human vocal modulation that we have

::

when we're just talking between people, right? So it ends up being uncanny valley for them, but in the

::

audio part in this case. So the vocal cues are also really important for that perception of

::

authority or credibility on a subject because that vocal, all those vocal modulations like the changes

::

in pitch and tone and pace and volume, it really conveys socially confidence, like social confidence

::

and authority on the topic. Also, there's research around the speech rate, like how fast you speak and

::

how faster, like slightly faster speech rates tend to convey greater knowledge and expertise.

::

So it's much easier to support your vocal dynamics in the scripting process before you

::

even start talking to begin with. So the first step on this is to dictate your notes. A lot of

::

people, they don't write the way they talk. This is completely normal. Not everyone. There's

::

exceptions because all of our brains are wired differently, of course, or you might have worked

::

on this specifically in the past. You have all those reps, all that practice on it. But if this

::

concept is new to you and you're used to writing essays and that's kind of what you're doing for

::

your show, you can switch it up from typing your scripts to dictating them instead, speaking them

::

out loud. Now, why would you even want to do this? Educators and business people, like business

::

leaders, the experts out there, are highly trained in writing. It's what you've probably done for

::

decades. I mean, literacy is important. We practice this as a skill that we build up,

::

and you've probably spent a long time learning to write all of those complex sentence structures,

::

long sentence, long words, but these are the things that we write, and they aren't necessarily

::

just how we talk on a day-to-day basis. For example, I'm not gonna write gonna in a written

::

article, but I am going to use it in my podcast like I did just there. And I use it quite a bit

::

according to the transcriptions. So when this formal or written language is spoken aloud,

::

your essay that you wrote, or even maybe your newsletter, you go to an article on the internet,

::

whatever it is, and you read it for your podcast. It creates what researchers call

::

a modality mismatch. And modality mismatch means that the listener's brain, your audience, is

::

wired for the rules of an oral system. So it's perceiving this spoken literate thing as unnatural.

::

It's overly complex, and then it just comes across as kind of robotic or artificial because of that

::

modality mismatch. So that reaction is what triggers a negative perception of your podcast

::

episode. That's what these researchers are talking about. So the fix for this is to start approaching

::

your episodes, that first step, with a voice-to-text tool, any of them. There's a whole bunch of them

::

out there. You can just dictate out your core ideas, what you want to put into that episode,

::

and you can use them to just dictate out your core ideas,

::

whatever you want to put into that episode.

::

And there's a ton of options out there.

::

If you don't use one already,

::

one that I've used is called Whisper Notes.

::

There's Whisper Flow.

::

There's Mac Whisper.

::

That's my favorite one.

::

That's the one I use the most.

::

You can get all of those tools for iOS and macOS.

::

There's ones that have offline only modes

::

and a lot of privacy features as well.

::

Like if you're worried about that.

::

and you probably should be with how all these things are going. Now there's some that are built

::

for voice transcription that are really focused on accuracy and efficiency. So a few more that you

::

might want to check out if you're new to these apps is Google Keep and Notion. Those are both

::

free options out there just to do plain old voice note taking. There's one called Willow Voice,

::

there's Otter AI, there's Granola AI. They're built specifically for transcribing voice.

::

So that accuracy and efficiency is their priority.

::

But the free plans on all of those are fairly robust.

::

So they'll probably be enough for your podcasting.

::

But I'll most often just open up my to-do app, which is called Things.

::

And I'll just dictate out my ideas, just start talking.

::

And those are the notes that I use just to start the structure of my episodes.

::

What I do is like the very first round of a new script.

::

So check out what you might already have on your phone or your computer,

::

and it very well might be sufficient for what you need.

::

So if you use this way to start your episodes,

::

you're more guaranteed to actually sound like yourself from the start.

::

You're sounding like a human using this kind of system,

::

just as long as you don't go and over-edit yourself later on.

::

So now that you have this file of human spoken notes already,

::

your natural speaking voice, that's what you're starting with. Then you can start forming your

::

script and lightly editing it from there. And those are the next steps.

::

So the second step of the system is to make a draft that's like a signpost draft. That's kind

::

of what I call it. So you have this draft of notes so far in your natural voice, and you're going to

::

refine those notes for your listener so it's easier for them to understand where they are located in

::

your episode. You're going to add that particular information to your draft. So these signposts.

::

Signposts are called discourse markers in academia sometimes. So a study on this particular thing,

::

discourse markers or signposts, are that humans comprehend their lectures better,

::

learning humans, when the discourse markers were included in those lectures. So you can add that

::

particular learning from the research to your podcast episodes to help your listener out.

::

and that's because these signposts in the lectures signal the structure of that information.

::

They do things like tell the listener what's relevant to them

::

and that helps the listener remember what you said.

::

So for example, they can go tell a friend what you said,

::

like tell a friend to go listen to it, maybe.

::

So that was a signpost right there if you heard it.

::

But seriously, it helps those students in the lectures remember their lectures better.

::

And this is partly due to what's called the cognitive load theory.

::

That's also in research and papers.

::

And that just means that the human working memory is pretty limited.

::

It has all these bottlenecks in it.

::

For instance, they can only hold, human brains can only hold,

::

like five to seven pieces of information for a short period of time.

::

So if they're tasked with like listening to an hour-long episode that has a lot of information in it, it's really, really dense,

::

the comprehension and the retention of your listener can then fail because you're overloading their working memory.

::

So these signposts help with that.

::

If you have like a wall of ear text, you're going to hit your listener's overload and that episode is going to fail.

::

So this is you helping with that thing.

::

So this part of the system, adding signposts, it deals with it.

::

And you're going to just tell your audience what to do to help their active working memory.

::

So this part of the process, your draft, you're adding new ideas, you're emphasizing science,

::

you're adding scientific support, say, to your notes.

::

And then you're also going to tell your listener what's coming next in the transitions in your

::

podcast structure. So between each section of learning, you might be adding summaries and you

::

might be telling your audience what's coming up. What do they need to know next? You could also be

::

telling them why is this information relevant to them? Or maybe you're thinking something right now.

::

You're going to be adding these sorts of things to your transition.

::

Like one example is like, I'll get to this other thing in a second,

::

but first you need to know this thing for it to make sense.

::

You're basically getting inside your listener's head,

::

but hopefully in a fairly good way.

::

So the third step of this process, step three,

::

is where you're going to format your script for performance.

::

You have a script, it's in your human voice,

::

You've added those signposts so your listener can understand the information better.

::

But before you start recording, you're going to format it for you, the performer.

::

You're going to format your script for reading and performing into a microphone or a camera.

::

So remember that wall of audio or text, I think I called it ear text.

::

If you're reading a script that's like this, you're using all of your cognitive resources

::

in the moment already because you're reading and you're decoding what you're reading while

::

you're reading it, while you're performing as well.

::

You might be thinking about your voice, where your mouth is in relation to the microphone,

::

your vocal dynamics, or if you're recording on a camera, you're trying to figure out if

::

you're still in the frame or you're hot on the mic. So with all of these things, you're also

::

trying to read your script or your notes. And the way that you format them can actually offload a

::

lot of this cognitive burden that you have on yourself. So you're going to be adding things like

::

notes or square brackets about your performance or what you need to remember, like add a little

::

story about this thing right here if you're going off script. Or you might add some bold face or

::

italics for words that you want to emphasize or say slowly. All of this stuff can become delivery

::

cues for you in the moment. And then those cues are going to reduce your cognitive load for your

::

performance. So that frees up your brain to focus on things like, I don't know, sound and human.

::

So you want to use spacing too that can help you scan and process the text really clearly.

::

You want to break long sentences into new lines for pauses or add notes for it,

::

like in square brackets, for example.

::

You want to use bold and italics for your anchor words.

::

These are the important words that will help you remember things to add emphasis

::

or add those stories, for example.

::

So while you can get reps to help with your voice, like the musicality and the cadence and all that other good stuff, the way that you form your scripts for a solo show will make a huge difference in your recording.

::

And it's going to give you a lot of that vocal element, all that good stuff for free.

::

So by using this kind of process, you're going to use dictation into a voice-to-text type tool

::

like WhisperMac to really match the way you speak as a human.

::

And that's going to connect with the user a lot better.

::

Then you're going to use some signposts to really just guide your audience

::

and reduce their cognitive load while they're listening to your episode.

::

And finally, you're going to format your script to reduce your cognitive load.

::

so you can perform it a lot more naturally, a lot more human.

::

So I hope this will help you with your podcast,

::

and I'll be back soon for the next episode.

::

Bye for now.

::

You have been listening to the Podcast Performance Lab.

::

This show is created, written, hosted, edited, and produced by Jen deHaan.

::

You can also find the video version of this episode on YouTube

::

and contact information on StereoForest.

::

Find the links for both of those things in the show notes.

::

Thanks for listening.

Leave a Reply

Your email address will not be published. Required fields are marked *