In today’s digital age, where customers communicate through various platforms there is a vast amount of unstructured text data waiting to be examined. While structured data like sales figures and campaign results are essential, it is unstructured text data — the actual words used by customers — that hold the most significant insights into their needs, perceptions, and experiences.
But how can we effectively sort through this immense amount of text data to identify patterns, trends, and actionable insights? This is where Topic Models come in handy. These models allow us to tap into the Voice of the Customer (VoC), offering a visual way to analyze and interpret the language customers use (and hear) in a systematic way.
Today, we will explore the significance of examining VoC data and familiarize you with the idea and purpose of Topic Models. Let’s dive in!
Why Voice of the Customer Data Matters
VoC data is essential in comprehending consumers’ genuine opinions, needs, and expectations using their own language. This data is sourced from a variety of channels, such as social media posts, product reviews, and survey responses, providing unedited insight into the thoughts of your target audience.
As analysts, our primary focus is on understanding three crucial elements:
- What brands say: Brands connect to customers in a variety of ways. Advertising may be the most obvious, but any social post, PR piece, or news story is a message that reflects on the brand. By examining a brand’s communications, we identify underlying themes and concepts it expressly (wittingly or unwittingly).
- What customers hear: The way customers interpret and understand a brand’s communications may not always align with the intended message. Analyzing this discrepancy can help us gauge the effectiveness of their communication tactics.
- What customers say: The most authentic and valuable insights often come from what customers say to each other. This is where we can truly learn about customer opinions and experiences.
The data we collect to understand these elements is unstructured and text-based. With the help of Topic Models, analysts can systematically identify and examine key themes and trends within the data. This allows us to gain insight into how well brand strategies align with customer perceptions and reveals potential areas for improvement or action.
What Are Topic Models?
Topic models are powerful statistical tools employed in natural language processing (NLP) to discover underlying themes within extensive sets of text data. While similar to simple word clouds (or tag clouds) in that both deal with text data, topic models provide a deeper, more structured analysis. Unlike word clouds, which only show the frequency of individual words, topic models help analysts understand the relationships between words and uncover key themes or topics in the text.
The predominant form of Topic Model is Latent Dirichlet Allocation (LDA), which groups frequently co-occurring words in the same clusters based on their usage.
Here’s how it operates:
- The text data is broken down into its set of individual words (collectively referred to as a “bag of words”). The word order is not significant; however, their frequency and connection to one another are key attributes for the analysis.
- LDA is able to group words that frequently occur together, each representing a distinct topic. As it analyzes these groups, it also uncovers relationships between topics based on the strength or weakness of their connections to other discovered topics.
- When text data is associated with a value, such as a star rating in a review or a sentiment score on a social media post, the model can further examination the correlation between each topic and that measure. For instance, the model can investigate how often ‘customer service’ or ‘price’ is referenced in reviews with higher ratings versus those with lower ratings.
Here’s an example of a topic model at work. While using a dataset of hotel reviews, LDA may recognize categories such as decor, room price, and room features. By studying the terms that comprise each topic, you can determine which features are most important to customers and how they relate (or do not relate) to brand strategies.
Words can appear in multiple topics. This happens because a word might be relevant to more than one theme. For example, in hotel reviews, the word “room” might appear in topics about both “cleanliness” and “comfort” since the concept of a room is important to both themes. The presence of a word in multiple topics can indicate a connection or overlap between those topics.
Also, the relative size of the word as it appears in the topic indicates how frequently it turns up in the analyzed text. Larger words appear more frequently in the dataset, while smaller words are less common. This helps analysts quickly identify the most prominent terms in each topic.
No specialized tools are needed to implement a topic model. There are several ways to create a topic model:
- R or Python: Analysts who are comfortable with coding can use packages like tm, textstem, or quanteda in R, or libraries like NLTK, SpaCy, or gensim in Python. These tools allow for text preprocessing, application of LDA topic models, and visualization of results. Both languages offer customizable solutions for topic modeling and give control over data pipelines to fine-tune models.
- Online Tools: For those who are not familiar with coding, online platforms like MonkeyLearn or Lexalytics provide no-code interfaces for topic modeling and other NLP tasks. These platforms allow users to upload text data and apply pre-built algorithms, making it easy to generate insights without programming knowledge. They are great for beginners in text analysis.
- Specialized Software: Platforms such as Brandwatch, BuzzSumo, or NVivo have built-in NLP features including topic modeling for analyzing social media posts, survey responses, and other forms of customer feedback. While these tools are primarily used by marketers and researchers, they can also be valuable for analysts working on VoC initiatives.
No matter which approach is chosen, topic models offer a structured method for understanding large sets of text data. They help uncover hidden themes, track trends, and generate actionable insights.
Building Topic Models: What Data Do We Use?
Selecting the right data for a topic model is crucial. As an analyst, your objective is to identify sources of unstructured text data that offer valuable insights into customer behavior and perceptions. Common sources include:
- Brand communications: All written content produced and shared by brands, including advertisements, social media posts, and blog articles, serves as the voice of the brand to consumers.
- Product or service reviews: Online platforms like Yelp, Amazon, and Google provide a wealth of information on customer satisfaction.
- Social media posts: Platforms such as Facebook, Twitter, Instagram, and Reddit contain real-time opinions from customers about your brand or product.
- Customer feedback: Open-ended responses from customer surveys or focus group sessions offer direct feedback to specific questions.
Once the data is collected, it must be cleaned and preprocessed to ensure accuracy. This typically involves removing irrelevant characters (such as punctuation and numbers), stop words (e.g., “and,” “the,” “is”), and converting all words to lowercase.
Another crucial aspect of preparing data is known as “stemming,” where suffixes are removed from words to reduce them to their root form. This technique transforms words such as “running,” “runner,” and “ran” into “run.” By doing this, variations of the same word are recognized as a single concept, allowing for more precise analysis focused on the underlying meaning rather than surface-level differences in word forms.
After preprocessing, the text data is ready to be input into a topic model, which will generate dominant topics within the data.
Interpreting Topic Models
The process of interpreting topic models involves identifying the main themes and subjects represented by each group of words, and understanding what they reveal about the text as a whole. This helps analysts gain actionable insights from large amounts of unstructured data.
Let’s break down our earlier example of using topic models to analyze hotel reviews. The model identifies distinct topics that are commonly discussed in customer reviews, such as decor and design, pricing, and room views and features.
- Decor and Design: Words like “modern,” “decor,” “design,” “bedroom,” and “comfort” indicate that customers focus on the interior design of hotel rooms. This suggests that aesthetics play a significant role in customer satisfaction and should be a key consideration for marketing and improvement strategies.
- Pricing: The presence of words like “price,” “worth,” “paid,” “deal,” and “extra” highlights customer concerns about value for money. This could suggest that competitive pricing or additional perks could improve guest satisfaction.
- Room Features: Words like “room,” “view,” “floor,” “balcony,” and “build” suggest that customers pay attention to the physical attributes of their room, particularly the view. This indicates that room views are highly valued by guests, and this aspect should be taken into account when considering improvements.
Topic models allow analysts to quickly identify which features are most important to customers. In this case, topics like decor, price, and view indicate that these are key factors in the hotel experience.
While topic models may not have the ability to directly measure sentiment (although some advanced tools do), they can provide valuable insights by analyzing the words connected to a given topic. For example, if the “price” topic includes words like “affordable” or “worth it,” it could suggest positive sentiment from customers. Conversely, if the words used are “overpriced” or “expensive,” there may be frustration among customers.
In our example of hotel reviews, if we first analyze each review for sentiment and then use that to score each topic, analysts can gain a better understanding of the intensity of feelings related to each topic:
Turning Insights into Action
Topic models are valuable tools for transforming unstructured text data into actionable insights. Analysts can use these insights in the following ways:
- Refining Brand Strategy: Compare the topics revealed in your VoC analysis to the brand’s intended messaging. Are you placing enough emphasis on the most important aspects of your product or service? Make adjustments to better align with what truly matters to customers.
- Optimizing Communication: If the topic model highlights a disconnect between what customers are perceiving and what your brand is saying, it may be time to refine your messaging. By addressing this gap, you can improve communication and strengthen brand-customer relationships.
- Identifying Emerging Trends: Keep an eye out for any emerging customer concerns or desires that surface through the topic model. For example, if a particular topic related to product quality starts gaining prominence, it’s a sign that action needs to be taken promptly before it becomes a widespread issue.
- Improving Products or Services: Use topic models to analyze the common themes in customer conversations and identify areas where your product or service falls short or excels. These insights can guide future improvements and help enhance overall customer satisfaction.
Conclusion: Using Topic Models to Understand the VoC
Adding Topic Models to your analysis toolkit is an effective method for accessing the Voice of the Customer. By studying customer reviews, posts on social media, and feedback from surveys, you can acquire valuable knowledge about customer perceptions, desires, and behaviors. This not only allows you to assess brand performance but also reveals ways to improve communication, products, and the overall customer experience.
With the use of topic models, you have the ability to prioritize the customer’s voice in your analysis, ensuring that your findings are based on actual experiences and viewpoints.