Seeing What We Hear

Listening Between the Lines with Acoustics and Speech Emotion Recognition

By Janine Karo

Mon Mar 14 2022

An Introduction to Acoustics

As humans, we have natural instincts about how to interpret speech. For instance, when we suggest to our friend we should pick up sushi for dinner and they say “…Sure?”, that long pause and questioning intonation might make us feel like they aren’t that excited about the proposition. We’d probably feel better about ordering sushi for dinner if they said “Sure!” with complete enthusiasm and without a moment of hesitation. In both cases, we’re tuned in to their contextualization cues, or features like pitch, tone, and pauses that help convey the intent behind someone’s words.

The inVibe Listening team is comprised of sociolinguists who are particularly trained to listen “between the lines” for these acoustic nuances. While we’ve all got good ears for this kind of deep listening, technology thankfully exists to reinforce our intuitions in the form of an established, rigorously tested speech emotion recognition (SER) algorithm. In the analysis phase of our work, we turn to this algorithm which tracks the physiological changes – those contextualization cues – in voice that are prompted by stimuli and reactions. These vocal fluctuations are then “translated” into the approximate emotions behind a response by scoring valence (level of positivity or negativity), activation (level of interest and/or excitement), and dominance (confidence and control). Finally, for each prompt, we can illustrate all respondents’ scores on a graph, visualizing the response distribution on a per-question basis. This feature is a valuable asset in the inVibe tool belt, as it allows us to “show” what we as linguists are picking up on.

This graph represents multiple HCP respondents in terms of their dominance (x-axis) and activation (y-axis) for one prompt. Most responses are falling to the right side of the x-axis, indicating respondents’ confidence in their answers.

Acoustics in Practice: The Analysis Phase

While our projects always consider acoustic metrics, we find them to be especially illuminating in concept and message testing studies to help us better understand respondents’ attitudes and reactions. One recent concept testing project that benefitted from acoustic analysis was for an eye care treatment already on the market. The client was looking to reposition this existing treatment and wanted to evaluate how three potential concepts fared with eye care providers (ECPs). To answer these questions, we designed a mixed methods quantitative and voice response study aimed at uncovering concept reactions and perceptions to best position the treatment moving forward.

In the analysis phase, the quantitative findings came in first and laid out a vague framework for us to build on. Concept A scored the highest for resonating with ECPs and motivating them to prescribe the treatment, with Concept B and Concept C resulting in the middle and lowest scores, respectively. These results gave us a sense of ‘what’ ECPs’ general attitudes were but not ‘why’ that was the case, so we then turned to the voice responses, where we had probed participants to explain why they found each concept un(relatable) or un(motivational). Through these responses, we heard ECPs praise and criticize various aspects of the visuals and messaging, which helped us make sense of their quant scores.

Perhaps sharing ‘what’ ECP attitudes were and ‘why’ they felt that way would have been sufficient, but we wanted to go one step further and show ‘how’ the ECPs sounded to fully substantiate and ensure client confidence in our analysis. We used our SER technology to plot ECPs’ answers about each concept’s relatability and motivation on the axes of activation (excitement/boredom) and valence (positivity/negativity). These two parameters are typical in our concept testing measurements, as they best depict participant reactions and feelings. From the following graphs, we see that what they said neatly corresponds with how they said it, justifying the quant rankings that we started with.

Concept A had the highest levels of acoustic activation and valence, indicating that it was positively received and attention-grabbing.

While still positively received, Concept B responses had lower levels of acoustic activation, suggesting less enthusiasm.

Concept C responses were most acoustically negative, which could be due to the more negative reception of language and imagery.

Here we see how valuable visual representation of participants’ emotions were for adding color to our qualitative findings. By reinforcing our linguistic intuitions with AI-powered acoustic metrics, we painted an even clearer, more definitive picture of ECP reactions for our clients.

inVibe Your Market Research Approach

Listening is at the heart of the inVibe approach, no matter the project. Starting with our awesome team of linguists and layering on technology like our SER software, we derive deep, nuanced insights grounded in human behavior. Whether you’re looking to test visual aids for an upcoming launch or want to better understand the patient journey in your disease state, inVibe’s suite of analytic capabilities can help you make smarter decisions that you feel confident about. Interested in hearing – and seeing – for yourself? Get in touch! We’re all ears.

imagine

Seeing What We Hear

Thanks for reading!

The Difference A Voice Makes – Part 2

Introducing inVibe’s Topic Analysis Tool: Transforming Complex Voice Data into Actionable Insight Grounded in Real Human Emotion

See More, Understand More: New Quant Views on the inVibe Dashboard

/voice

/product

/resources

/use-cases

/company

@social