By Xiaoming Jiang

As social animals, human beings possess a unique ability to communicate social intentions—such as a feeling of knowing—in their tone of voice. Vocal confidence can be used as a signal of one’s persuasiveness, expertise, and trustworthiness. However, scientific investigations of vocal confidence are relatively scarce. As a postdoctoral researcher in the Neuropragmatics and Emotion Lab led by Dr. Marc Pell at McGill University, I am working on an ambitious project investigating how the human voice can encode a speaker’s feeling of knowing, and how the human brain has evolved the ability to decode this complex mental state.

Perceptual and vocal attributes of voice of confidence

In a recent communication on social prosody (Jiang & Pell, 2014), we reported on a newly validated vocal database of over 3,600 expressions produced in North American English. We invited 6 professional actors or public speakers to record statements intended to convey that they were very confident, close to confident, or non-confident (in addition to a neutral comparison). These expressions were led by verbal probability phrases such as I’m confident, Mostly likely, or Perhaps which were congruent with the tone of voice in the expression. These cues were used to help speakers produce the appropriate confidence level, but the recordings were edited so that listeners could hear versions of the statements both with and without the explicit linguistic cue. A group of 60 native speakers ratings of the intended levels of confidence tracked the speakers’ intentions with high accuracy. The average confidence rating was lowest for non-confident voices (2.3), followed by close-to-confident (3.5), and neutral (4.0), and was highest for confident voices (4.5). We found that the confident voice, as compared with the non-confident one, was characterized by specific acoustic signals: a lower pitch, reduced loudness, more flat intonation, faster speech rate, and more restricted change in loudness with the unfolding of the vocal expression.

How fast can we decode the voice of confidence?

Our lab followed up on these findings by investigating how these differences in perceiving confidence relate to neural signals. We recorded electroencephalograms (EEG) while listeners decoded vocal signals of varying levels of confidence. When only vocal cues were available (the linguistic cues like “perhaps” were not included), there were neural differences in the confident and non-confident voices after only 200ms. These neural signals appear to reflect increased attentional processing for the confident voice. The close-to-confident voice received more continuous attention over the course of the statement, as reflected by a larger positive wave evoked at 370ms when processing this voice compared to the more definitive confident or non-confident voices. Neutral-intending voices tended to be rated as reasonably confident, and neutrally they were best distinguished from other voices in a late positive wave occurring at around 900ms after the speech act. This may reflect a mismatch between the task goal of rating confidence and the speaker intention—not to include information about intention (Jiang & Pell, 2015).

We were also interested in how vocal and verbal speech signals interact. We found that, when vocal cues are combined with verbal cues, the differentiation of confident and non-confident voices occurred earlier—at 100ms. Compared with vocal-cue only expressions, neural responses to expressions with combined verbal and vocal cues were greatly reduced. Verbal information thus appears to facilitate the processing of vocal information for inferring speaker confidence, leading to less heightened neural activity but also to more rapid neural differentiation between varying levels of confidence (Jiang & Pell, 2016a).

How can we resolve conflicting message in speech signaling feeling of knowing?

We examined how lexical and vocal cues can facilitate each other, but we were also interested in how people decode mixed messages—those where verbal and vocal cues conflict in conveying feelings of (un)knowing. We reported two different cases, where a listener’s brain appears to decode with distinct strategies (Jiang & Pell, 2016b). Hearing I’m confident followed by unconfident voice places a heavy burden on the integration and update of vocal information into the verbal context, using brain structures dedicated to conflict resolution. Hearing Maybe followed by confident voice induces delayed inferential mechanisms, using the brain structure focusing on perception of other’s hidden intention. Thus resolving conflicting messages regarding confidence depends on what type of conflict is being resolved, with distinct brain regions playing different roles in processing depending on whether the confident signal is preceded or followed by the unconfident signal.

Females are better at confidence decoding?

We also found a sex difference in decoding feelings of knowing from the voice. Female listeners in our study had stronger sensitivity to vocal cues of confidence, rating confident-intending expressions as more confident and non-confident-intending expressions as less confident than male listeners. We believe that these sex differences are mediated by individual differences in the neural responses to vocal expressions or by differences in personality, such as trait anxiety and trait empathy. It appears that female listeners engaged in very early acoustic analysis and made delayed social inferences for complex messages. In contrast, male listeners detected the relevant social information in the voice and immediately changed their levels of attention. A statistical mediation analysis we performed also found that females tended to have higher levels of trait anxiety, and that this, in turn, influenced early stage neural processing.

Overall, our research project has found that people appear to be good at evaluating confidence in another person’s statements, and this evaluation involves the interaction of both vocal cues—such as speed and intonation of speech—and linguistic cues—such as preceding a statement with the probability phrase “I’m confident” or “perhaps.” These cues also appear to be processed rapidly at the neural level—in some cases being differentiated as rapidly as 1/10th of a second into a statement. However, the way that mixed signals of confidence generate brain activity point to different underlying integrative processes: some related to conflict resolution and others related to detecting concealed intentions. These reports have also raised intriguing new questions currently under investigation regarding how personality traits can be inferred from how “feelings of knowing” affect the voice. We hope this research project will continue to shed light on how we make inferences in mental states of speakers—a common process used to evaluate teachers, business negotiators, politicians, and many other influential figures.

Xiaoming Jiang is a postdoctoral researcher in the Neuropragmatics and Emotion Lab led by Dr. Marc Pell at McGill University. Jiang investigates how the brain makes social inference through vocal cues, more specifically on how inferential processes takes place in a cross-cultural communication setting.


Jiang, X. & Pell, D. M. (2016b). Feeling of another knowing: how “mixed messages” in speech are reconciled. Journal of Experimental Psychology: Human Perception and Performance. In Press.

Jiang, X. & Pell, D. M. (2016a). Neural responses towards a speaker’s feeling of (un)knowing. Neuropsychologia, 81, 79-93.

Jiang, X. & Pell, D. M. (2015). On how the brain decodes speaker’s confidence. Cortex, 66, 9-34. 

Jiang, X., & Pell, M. D. (2014). Encoding and decoding confidence information in speech. Proceedings of the 7th International Conference in Speech Prosody (Social and Linguistic Speech Prosody), 576–579.