A core principle of inVibe has been the relentless pursuit of improving the market research process. We started with the goal of conducting cost-effective, high-speed qualitative research that didn’t sacrifice depth and quality. To this end, we continue to build software for all market research stakeholders — from research participants to our primary research customers to our internal teams.
A key area of opportunity for inVibe has been developing tools that aid the analysis of the data we collect. One of the most challenging aspects of what we do, and qualitative research in general, is organizing and structuring qualitative data. Thanks to our unique voice-response methodology, we start with much more structured data than traditional qualitative research. For each question in an inVibe study, we have an array of transcribed audio files — one for each participant in the study. Clients and analysts alike have long been able to use the inVibe platform to filter, sort, or group answers in our reporting dashboard by a variety of factors such as demographics, speech-derived emotions, and other linguist-coded signals.
But what if we could organize those answers by the content of the answers themselves? Wouldn’t it be easier to listen through all of the respondents who took a negative stance on a creative concept at once?
Even seeing the distribution of positive/negative/neutral helps frame the data in a meaningful way. Any pre-work the technology can do to help organize content ahead of time translates to faster and simpler analysis. In other words, our team of insight explorers can spend less time creating a map of the content landscape, and more time consuming its lush and revealing landmarks.
Recent technological advancements in machine learning and natural language processing (NLP) have pushed us to expand our thinking about what processing tasks can be offloaded to a machine. inVibe is excited to share our progress in this area through utilizing GPT-3, an advanced language model created by OpenAI.
What is GPT-3?
GPT-3 stands for "Generative Pre-trained Transformer 3." Consisting of over 175 billion parameters, it is one of the largest language models ever created. GPT-3 represents a drastic leap in the capabilities of NLP models. Because of the massive size of the model and its corpus, GPT-3 excels at what are called "zero-shot," "one-shot," or "few-shot" machine learning tasks. That means tasks that involve either zero, one, or a few examples. Examples in this context are the sample data that teach or train the machine learning model. Much of qualitative market research involves a smaller number of respondents, and therefore fewer examples to leverage. This naturally leads to diminishing returns for small sample sizes as the number of examples required increases.
In collaboration with Edge Analytics, inVibe has built several algorithms on top of GPT-3. These algorithms are used to perform novel NLP tasks, some of which we will explore in more detail below. If you are curious about some of the technical challenges around working with GPT-3, see Edge Analytics' recent post on getting the most out of GPT-3 based text classifiers.
It is worth noting that GPT-3 is better at some tasks than others, and it does have limitations. However, combining GPT-3 with expert human analysts (e.g., our extensive team of language experts) and other proprietary automated tools gives us the best of both worlds. It allows us to produce reports that are rich with information and insights efficiently and with a high degree of accuracy.
Use case: sentiment analysis
Sentiment analysis is far from a new idea in the world of NLP. It is perhaps the most widespread task used in NLP. Because of this, many are quite familiar with its traditional limitations. While simple examples perform reasonably well, things start to break down when phrases include rambling, double negatives (e.g., "I wouldn't say it was my least favorite."), or references to other topics (e.g., "I like this a lot more than the other one, which was confusing and uninformative."). Recently, state-of-the-art sentiment analysis engines have been built using deep machine learning, but these models require hundreds if not thousands of domain-specific examples in order to deliver relatively meaningful results.
However, with the GPT-3 algorithms we developed, inVibe was able to achieve better performance on "subject-aware" sentiment analysis than any of the existing solutions we tested against. "Subject-aware" means that our technique considers the subject of the phrase, not just whether the words are positive or negative in general. This is a marked improvement over traditional techniques and even exceeds the performance of advanced models like Google Cloud's semantic analysis API.
Let's look at some examples. The following responses were collected from patients participating in an inVibe voice-response survey where they were asked to provide their opinion about three different proposed advertising concepts for a heart condition called atrial fibrillation: Concept J, Concept F, and Concept K.
“I did like Concept F much better. I felt like Frank could actually be a friend of mine or somebody I know that has atrial fibrillation. He showed that it was a little bit funny, or I shouldn't say funny but lighter, but it wasn't just making a joke of the whole thing, so I did like this one much better.”
Google Cloud Algorithm -> Sentiment
inVibe GPT-3 Algorithm -> Sentiment
The algorithm output in this example illustrates the power of "subject-awareness." The quote is in response to a question about "Concept F." If we break it down sentence-by-sentence, Google Cloud's API assigned a negative score (-0.8) to the first sentence: "I did like Concept F much better." It assigned a positive score to the remaining sentences, which results in the response being labeled as "neutral" as a whole.
In contrast, inVibe's GPT-3 algorithm 'understood' that the response was about "Concept F" and correctly labeled the response as "positive" overall. We saw this consistently across a majority of the responses captured with an accuracy of over 90%. This is a case where incorporating the subject into the design of the algorithm is able to significantly improve performance in an area where most machine learning models struggle to accurately derive sentiment.
Use case: identifying concerns
One advantage of operating our proprietary voice-response platform at inVibe is that we can invest in optimizing seemingly rare or niche tasks. For example, let's consider a common type of study we conduct — conference intelligence research. These studies are typically conducted with physicians shortly following the release or presentation of new data about a therapeutic intervention. Oftentimes a client leverages our platform to measure a physician's enthusiasm or hesitation around adopting a potential new treatment option — or to determine whether some aspect of the study design may be causing concern.
To aid in this analysis, we developed an algorithm to identify and extract concerns (e.g., trial design, side effects, etc.) from any response our system captures. Unlike sentiment analysis, this task results in an open-ended output that isn't restricted to choosing from a few predefined labels. Instead, inVibe's algorithm is able to accurately extract any concerns, including ones it has never seen before, from the collective body of transcribed content. Furthermore, the algorithm is also capable of paraphrasing these concerns in remarkably succinct phrases.
“Very exciting to have, uh, possible curative treatment option. Concerns about, um, longevity of response and concerns about, um, long-term effects as well as fertility.”
inVibe GPT-3 Algorithm -> Concerns
This example demonstrates how our algorithm can accurately identify and extract any number of concerns and is undeterred by stop words like "uh" and "um."
“The data is very encouraging in terms of the, uh, significant improvement in hemoglobin F and total hemoglobin for the patients, uh, offering a potential functional cure for these patients. What is, um, concerning at this point is that there is a very small sample of patients, only seven patients as part of the trial.”
inVibe GPT-3 Algorithm -> Concerns
In this example, our algorithm correctly identifies one key concern and paraphrases by shortening "a very small sample of patients, only seven patients as part of the trial" to just "small sample size." Compared to extracting entire phrases or sentences, paraphrasing makes it easier for analysts or clients to quickly read through the list of concerns and identify any commonalities.
“Um, um, you know, there were, weren't really any significant serious, uh, AEs, um, although, obviously, there was, uh, a some, uh, uh, patients who'd have stopped RBC transfusions, uh, within a few months. Uh, I guess there was some concern, obviously, about the, the febrile neutropenia that's a, that is a problem, since there were 11 patients that, uh, that, uh, developed it.”
inVibe GPT-3 Algorithm -> Concerns
Here, the response is full of medical jargon and run-ons. Even the part of the response that includes some concerns is expressed through a collection of semi-coherent and wandering phrases, like "Uh, I guess there was some concern, obviously, about the." This kind of language is part of how people naturally speak but can easily cause problems for even some of the more sophisticated algorithms available to researchers and analysts. Despite this, inVibe's algorithm was able to correctly identify "febrile neutropenia" as a specific concern in this physician's response.
Why this matters
A common trope when discussing AI advancements is to project into a future where the AI system is more knowledgeable than humans. While results like this would not have been possible even a few years ago, it is essential to reframe where AI fits with the future of knowledge work and insights gathering. Our use of NLP is not to replace human analysis but rather to allow language experts and customers alike the ability to focus on higher-level thinking.
These tasks are tools to add context and provide meaningful organization to our data. Much like a scientist behind the microscope — the instruments used to measure and distill may change, but the need for human intellect and contextually relevant interpretation remains constant.
The ability for an analyst to view the distribution of sentiment in reaction to a concept enables them to immediately ask better questions of the data — and helps them understand where to dive in. Likewise, seeing a range of concerns about a study has a similar effect.
For insights to be valuable, organizations must have the time to take action. However, the longer the data collection and analysis process takes, the less relevant the findings will be. This same heuristic can apply to a more general truism — the longer it takes to answer a question, the less valuable that answer will be. This is why we as an industry have so often focused on speed at the expense of quality. With the steps we’ve taken, inVibe hopes to increase the clarity, organization, and timeliness of our insights for clients.
In part 2 of this series, we’ll expand on how modern, AI-assisted workflows can bring about new and exciting ways of gathering insights and listening to customers.