Customer insights as an asset: Predii Voice of the Customer

By Anurag Wagh, Chaitrali Johari, Neel Shah, Azhar Jodatti, Nadja De Maeseneer

Predii is an AI-powered insight engine with the ability to transform unstructured data into useful business insights. We enable aftersales decision-makers to use their repair orders, telematics data, OEM information, tribal knowledge, and – in the context of this blog - voice of the customer data to add value to their business.

Using the Voice of the Customer as a competitive edge

Consumers in the 21st century are more expressive, more chatty, than ever before. They voice their sources of delight and concern in surveys, through emails, social media platforms, chats, and in forums.

Imagine the following: a customer hears “knocking noise” coming from their vehicle and decides to express their concern as part of a service claim, an e-mail to their dealership, or even through social media. They have expressed their opinion and generated valuable information for automakers and other aftermarket stakeholders.

Traditional analytic tools will only be able to make use of this information once is has been translated to a structured, not textual format e.g. a diagnostic code. This will either happen at a very late stage of our customer journey, or – in case of a social media post – it will never even appear on any automakers radar at all.

Taking this one step further: if a specific customer concern, or even a combination of concerns, can be observed more frequently this could suggest a trend, a common problem that might be worth investigating. Automakers have an interest in this kind of insight because it can support their aftersales strategy, brand positioning, parts marketing, warranty management or even safety recalls.

Why does this matter? Enterprises in the ‘smart era’ evolve around customer-centric ecosystems built to own and control the entire customer journey. Business strategies shift from an R&D driven approach to having the end-user as the focus for business decisions in every part of the organization. The aftersales business is no exception.

To build and support this customer centric strategy enterprises require as much information as possible about their customers. Standard analytics and reporting tools can paint a pretty accurate picture about what customers do: purchases made, service requests sent, complaints issued. Their blind spot is delivering what customers perceive.

Access to accurate, unfiltered customer opinion completes the foundation for any customer centric strategy. If leveraged properly, voice of the customer insights can offer useful and valuable insights to enterprises, support parts and service marketing and product development and help them get an early pulse of their brand awareness.

Predii can mine and extract this data right from the source: customer surveys, verbatim data on repair orders, forum and social media data.

How Predii extracts voice of the customer insights

We are able to make sense of voice of the customer data more efficiently than any other analytics offerings. We put actionable information into the hands of after-sales decision makers, empowering them to make data-driven decisions with a minimum of fuss. How do we do this?

Key aspects of Predii's patented technology are its Automatic Ontology Discovery, Domain-Specific Natural Language Processing, and its scalable AI pipeline. Our auto-classification system for unstructured textual data accurately expertly bins “verbatim” comment data into consistent groups.

In simple words: Predii does not need pre-tagged data sets. We automate data labeling and translate the “noise” coming from different sources into usable insights at a component and feature level.


The Experiment: How Predii can use Voice of the Customer Data to identify trends in the aftermarket

Predii is well known in the industry for automated data tagging, specifically for repair orders. To leverage what we call “the true voice of the customer” our team decided to go beyond that and explore other sources of customer opinion.

Our starting point had been the assumption that the Coronavirus pandemic had somehow impacted how auto customers use, service, and repair their vehicles. Current statistics from Auto Care Association's(ACA) Trendlense suggest a significant drop in vehicle miles travelled (VMT) through the first quarter of last year, resulting in fewer repair jobs and fewer visits to service centers. Aftermarket employment dropped to a historic low in April 2020.

Screenshot (91)Source: Auto Care TrendLens

Screenshot (90)-1

Source: Auto Care TrendLens

While those consumer touchpoints might become a less reliable source of voice of the customer data, we found that social media – in particular large public automotive forums – house an incredible number of conversations, including customer complaints, problems, and possible solutions. We were curious to see if the pandemic influenced the conversations customers were having on these forums, what insights we could extract, and what decisions automakers might make from this data.

Step 1: Extracting the Data

Looking at the sheer abundance of automotive OEM forums, we decided to put our focus on only one data source for our experiment: the public Honda forum.

In a first step, we started crawling all forum threads for the year 2019 and 2020 using a custom python script. To keep the data intake reasonable, we made a few decisions on which data to extract:

  • focus on main threads and exclude replies to avoid noise in the data
  • skip sticky threads, which are mostly guidelines or FAQs posted by site moderators and don't contain any complaints
  • exclude random punctuations and hyperlinks

Step 2: Understanding the Data

Identifying linguistic patterns is crucial to processing raw text intelligently. We decided to use spaCy, an open-source library for Natural Language Processing in Python, that allowed us to identify how a typical Customer Complaint is grammatically structured. Feeding spaCy random samples from our extracted data we identified patterns and extracted all associated words.

Screenshot (92)

An example statement:

O2 sensor is not working.

This statement is built of a token (O2 sensor, a component) and a aux+neg+verb pattern (is+not+working). Once identified, we would look for other tokens which are followed by the same aux+neg+verb pattern.

Spacy Feature Extraction Visualisation-1

We went through multiple iterations of this process until we had extracted a significant size of components and did the same for symptoms. Predii's Enterprise grade AI Pipeline processes billions of lines of service & repair verbatims at 90%+ feature extraction accuracy for our customers. Above example showcases how this is done against social media content sample.

Step 3: Extracting Customer Concerns

Part of making sense of raw data is to connect related terms with Predii's domain specific taxonomy and schema. In short, taxonomy is the part of Natural Language Processing that describes how different words are related. For example, in automotive language “cat” and “catalytic converter” are two different words for the same component. This relationship, “cat” = “catalytic converter”, is defined in the taxonomy and becomes a key aspect of translating data to insights.


We trained a Named Entity Recognizer (again using spaCy) on the forums data to recognize automobile components and symptoms from the text data. We used this Named Entity Recognizer model and proprietary Predii's Ontology Discovery (Intent discovery) algorithm to extract these named entities and construct human readable normalized customer complaints.

Step 4: Visualization

Having data without being able to use it, is pointless. Insights becomes meaningful when they translate into business intelligence. To accomplish that we need to ask the question: what problem is our customer trying to solve?

Looking back at the beginning of our experiment, we were assuming that, as visits to service centers became more inconvenient, consumers would turn to alternative channels. Automakers could leverage the data extracted from these channels to gain valuable customer insights.

One insight that stands out is: What are the top customer concerns? (How) did they change from 2019 to 2020?

Since we were focusing on extracting symptoms, a horizontal bar chart seemed most suitable, making it easy to compare different data sets and investigate any patterns.

Screenshot (93)

Screenshot (95)

Technical detail: We used a D3JS library, mainly because it was the most flexible solution allowing us to bind the data with html elements using all modern web standards and customize most elements of our visualization.

Lastly, we integrated all visualization into one dashboard application. Our goal was to make this dashboard easily accessible for any end-user and allow them to filter and look at the results in any way they needed.


Leveraging the data extracted from automotive forums can help automakers understand their customers’ most relevant concerns. In our experiment the highest-ranking concerns were engine issues, followed by oil leaks and concerns about noise. Interestingly enough, there was a visible trend suggesting that battery related symptoms gradually moved up to the top customer concerns towards the second half of 2020.

Getting back to why understanding Voice of the Customer data is so relevant for smart organizations, the interesting part starts right here: With the customer concern landscape being laid out before them, automakers can start asking the right questions: Are there any anomalies that would indicate a specific problem relevant to warranty or safety recalls? Which parts seems to be failing frequently? If the “check engine light on” is amongst the highest-ranking concerns in a public forum, why do customers feel the need to express their concerns through this channel in the first place (as opposed to addressing their dealership or customer assistance center)? What can be learnt about brand perception and brand attractiveness?

Making data usable is one thing. Turning it into valuable business insights is where the magic happens.