Can Labeling Text Data Be Automated?
Updated: Mar 25, 2021
Many AI projects require too much time and energy for the return to be worth it. New ontology-based processing approaches with built-in autodiscovery are the solution.
We live in the Smart era. Smart cars, smart factories, smart equipment smart homes, smart assistants: the list goes on.
It’s time to understand your customers, your products, and your data – smartly.
The key to these insights can be found in your unstructured textual data, such as customer emails, surveys, technician notes, call center dialogs and more. NLP and text analytics are not new technology, but the tools available today struggle with efficiently adapting to niche proprietary data. Because of this, there’s a lot of unleveraged data collecting dust – it’s been simply too hard to use!
A large reason for this difficulty is that many current AI solutions require pre-labeled data sets. This blog is a walkthrough of the key challenges posed by data labeling, and the new approach to AI platforms that do not require such time-intensive preparation.
What’s the takeaway? These new AI platforms are easy wins for BI professionals or other users trying to leverage their data to power digital transformation – a tool in their kit, with easy integrations, bringing AI ROI far closer to turnkey status than has been possible before. This is the catalyst.
The Challenges with Standard Text Analytics
One of the standard ways of performing text analytics is to label a subset of data, process it to generate statistical models from this set, and then use those models to apply labels to the complete data set. This approach has three main challenges, which sometimes are insurmountable enough that the enterprise will avoid digital transformation altogether.
First, enterprises frequently do not already have these labeled data sets, and creating them takes time and money. Manual creations of these sets also creates problems – manual interpretation is not consistent.
Second, the processing models are statistical in nature. Trained end users who really understand the domain (like mechanical engineers or technical support personal) simply get a result. They don’t know why they got that result. The process is a blackbox.
Third, configuring the models requires re-labelling lots of data, making it difficult to fix small problems or adapt to new kinds of data – and easy/agile tuning is required in order to get to the level of reliability/accuracy required. Release a new feature on your product? Your data has just changed, and now your processing models are out of date.
How Can You Automate Data Labeling While Respecting Domain Expertise?
The next generation of augmented analytics providers resolve these three main challenges by using ontologies to process the data. For those that are unfamiliar, ontologies are a way of mimicking expertise: they represent the relationships between various items, like a symptom is connected to a failure is connected to a resolution.
These are flexible concepts: ontologies can be modeled to capture the relationships that matter to the end user. For example, call center experts might be much more interested in sentiment scoring and sales performance than repair and maintenance experts. What an employee sees in the data is what the AI pulls out!
Ontologies are also explainable to the end user, as they can follow the decision making process the end user performs daily to execute their role, and are represented in the language they themselves use. Having an end user tweak an ontology is easy – having them generate a new model by retagging thousands of rows is not.
Ontologies are likewise not a new technology. They contain their own challenges. The chief challenge for an enterprise using an ontology to interpret data is building the ontology and taxonomies themselves.
The answer is a three-step process:
Use a suite of NLP techniques to auto-discover the relationships present in your data
Rapidly tweak the ontology with domain experts to just the relationships they care about
Use ontology-based processing to extract insights from your data
This avoids the pain points of using only one processing method (manually tagged data or ontologies, alone).
Auto-discovery engines make building the ontologies simple, rapid, and easy. For example, these AI platforms can now provide out of the box discovery of symptoms, failures and repairs with roughly 60% accuracy. A domain expert can then easily review and insert the minor tweaks to the ontology to get accuracy to the 90+% threshold they require.
The takeaway? Because so much of the discovery is automated, this is days and weeks, not months and years.
What’s The Value?
A chatbot without expertise powering it is a pain, not a service.
The difference-maker with the approach outlined in this blog is that these learning models allow the processing to be performed with an understanding of the context in which the data occurs – it is not a mathematical model, it is a representation based on how the operations occur in the real world, and thus the knowledge workers that best understand how such functions occur in the real world (for example, technicians and quality engineers) can easily interact and impart their expertise.
You are mimicking their expertise, how they themselves would interpret the data – except in an automated and consistent manner. You can then abstract their expertise, and plug it into workflows at every decision point.
What’s best is that this method is faster and more efficient than other approaches.
Why Is New Tech Necessary?
The specialization of industrial data sets requires equally specialized Artificial Intelligence platforms with custom natural language processing techniques and ontologies. Moreover, as data sets continuously evolve, these specialized AI platforms need to allow for continuous learning and processing of data on an ongoing basis to provide a dynamic (continuously evolving) solution for problems like issue classification, resolution insights, guided decision making, product analytics, SLA analysis, and more.
The ROI simply can’t be justified without adaptable, scalable and customizable data discovery and processing capabilities.