+++ UPDATE +++

James Spencer VoiceCoach who knows more than a thing or two about public speaking writes to me about the piece below:

This is one of the best critiques of quantifying communication that I’ve come across in the vast flotsam of internet public speaking commentary. You also address, specifically, a piece of physical advice parading as deep insight, which is just obvious. Often physical advice is obtuse, or vague, I find.

The pretense of Zandan is that QA is inventing a better wheel, or a newfangled conveyance that’s better than the wheel, and it is in fact, just a Segway. I look at the QA emails, touting their product. One of their insights is : the pillars of persuasion are logo pathos and intuition. Intuition? Really, how does one speak to an involuntary faculty? Why is it an improvement over ethos? No explanation. And none needed really.

I approach coaching from the perspective of an actor, so it’s nice to see commentary from a non-performer who is as interested in both the physical (social) part of rhetoric as well as the discourse side.

* * * * *

One of the many best things about working with Orai is that you have to think HARD about what’s going on in public speaking and in normal speaking. What’s there? Can it be measured? How to assess the measurements? What conclusions to draw therefrom?

One easy thing to measure is the speed of the words emerging from your mouth.

One of the disconcerting things about Apple’s dictation mode on an iPhone is that the faster you speak, the more accurate it gets.

If I talkasfastasIpossiblycanforsentenceaftersentencestraightintotheapp it produces it neatly laid out in words anyone can read. That’s because (I think) it gets to guess words in good part from context, and as they flow fast the context is more obvious. Or maybe it’s an algorithm. Whatever THAT is.

Anyway, the technology is astonishing. Voice recognition has come on leaps and bounds since those early days when you had to read parts of Alice in Wonderland into your PC to train it to measure your voice.

Computers are now really good at identifying what ‘words’ are from the sounds we emit. Once ‘words’ are identified, they can be counted and ‘weighed’. But this has to be done in a disciplined way.


Zandan’s team of data scientists analyzed more than 100,000 presentations from corporate executives, politicians, and keynote speakers. They examined behaviors ranging from word choices and vocal cues to facial expressions and gesture frequency. They then used this data to rate and rank important communication variables such as persuasiveness, confidence, warmth, and clarity.

Hmm. How did they in fact analyse so many presentations and speeches?

On they go:

Disfluencies — all those “ums” and “uhs” — might be the most difficult vocal element to address … But as you move from one point to another, disfluencies stand out because your audience is no longer focused on what you are saying. In essence, you are violating your audience’s expectation of a silent pause by filling it.

To address these between-thought disfluencies, be sure to end your sentences, and especially your major points, on an exhalation.

Wait. All speaking is an ‘exhalation’! Try talking while breathing in. See how that works.

Anyway, the Internet assures us that the typical TED talk runs at some 160 words per minute. Maybe even faster in some cases.

No. No. This pace is simply too fast.

TED talks may or may not be ‘cool’ but they are NOT a standard for any normal public speaking.

TED talks have devolved into people putting on an over-rehearsed show. It’s often impressive and even engaging. But in the nature of the format there’s literally no conversation with the audience. Race through it whooosh next one please.

The best passage from Barack Obama’s superb victory speech in 2008 included these words:

This election had many firsts and many stories that will be told for generations. But one that’s on my mind tonight’s about a woman who cast her ballot in Atlanta. She’s a lot like the millions of others who stood in line to make their voice heard in this election except for one thing: Ann Nixon Cooper is 106 years old.

She was born just a generation past slavery; a time when there were no cars on the road or planes in the sky; when someone like her couldn’t vote for two reasons — because she was a woman and because of the color of her skin.

And tonight, I think about all that she’s seen throughout her century in America — the heartache and the hope; the struggle and the progress; the times we were told that we can’t, and the people who pressed on with that American creed: Yes we can.

… She was there for the buses in Montgomery, the hoses in Birmingham, a bridge in Selma, and a preacher from Atlanta who told a people that “We Shall Overcome.” Yes we can.

A man touched down on the moon, a wall came down in Berlin, a world was connected by our own science and imagination.

And this year, in this election, she touched her finger to a screen, and cast her vote, because after 106 years in America, through the best of times and the darkest of hours, she knows how America can change.

Yes we can.

Obama plays with the cadences of African American folksy church rhetoric, accelerating and slowing down, softer and louder. Spot contrast after contrast. This passage has everything you ever need to know about speaking well in the style of Barack Obama. Watch it:

The passage here is 505 words long, delivered at 149 wpm.

At the far end of non-Obama oratory is Jim Carrey at the 2016 Golden Globes:

Magnificent. Almost more silence than speaking. 137 words in 102 seconds – 80 wpm.

There is a HUGE difference in the options a speaker has for conveying tone, authority, impact and humour as between 150 wpm and 80 wpm.

Note that neither is ‘right’ or ‘best’. They’re just doing different things

If it’s done well, speaking faster with lots of ups and downs of variation as in that Obama speech conveys a mood/tone of excitement and energyinspiration. Just the right tone for a new President after a stirring victory. If it’s done badly, speaking fast is nothing but a gush of noise you can safely ignore.

Speaking slower with long, almost awkward pauses, conveys a mood/tone of wisdom. Thoughtfulness. Maybe even melancholy? But in all those things there’s a lot of humour if it’s done well. Done badly, it’s painful. The audience wonder if they should feel sorry for a speaker who doesn’t know what to say.

But in either case pace has to be context-specific. Size of room and audience on the day? Interpreters? Inside or outside? Audience sitting or standing? How far in this speech do you want the audience to think and learn as opposed to feel?


Orai helps you measure/know your pace and practise different ways of giving a speech or explaining yourself in an interview, working in pauses as makes sense. The real challenge for the app comes in analysing and advising. What may be too fast (or too slow) for one audience/room/occasion may be just right in another context. How to get AI’s head round THAT?

In progress.