Did you know that nearly 25,000 incidents of failed brakes haven been reported to the National Highway Traffic Safety Administration between 2001 and 2021? Almost 20,000 airbags did not deploy, 7,000 times doors failed to open and a stunning 3,000 sunroofs have reportedly exploded during this time.
What if you were able to identify these safety issues before they became exactly that: an issue? What if you had full visibility into patterns and anomalies that affect vehicle safety – with enough time to investigate, fix, mitigate? Imagine AI being able to provide early indication of component failure, drilled down to specific makes, models, even mileage. Prescriptive insights, that enable you to act rather than react to quality concerns – all based on real life data, reported by customers across the United States.
About NHTSA 'Safety Problem Data'
The National Highway Traffic Safety Aistration (NHTSA) maintains a publicly accessible data base for safety issues. Consumers can report a ‘safety problem’ directly through the NHTSA Website or indirectly through other channels, like their dealerships.
NHTSA uses this information as a basis for investigation into defects and potential vehicle safety recalls.
In total, this data base contains more than 1.2 Million records on real life vehicle issues, safety & quality concerns, and component failure, making it an incredible rich source of information for manufacturers and consumers alike.
The Challenge with Verbatim Data
However, as with most ‘customer verbatim’ data, these records are extremely hard to translate into usable analytics. The actual customer complaints consist of textual data, real life human language, which is unstructured, messy, and contains very domain-specific – even NHTSA specific - automotive terminology.
This is where Artificial Intelligence and Natural Language Processing (NLP) can help.
Predii’s NLP engine has been purpose-built to extract predictive and prescriptive insights from unstructured textual, sensor, and procedural automotive data. After 8+ years in the automotive aftermarket, and processing 2.5+ billion repair jobs every month, there is hardly any part, symptom, repair, or automotive tech jargon that our algorithm has not seen, yet.
Predii’s capability to accurately interpret domain-specific human language can discover the intelligence hidden in messy, unstructured verbatim data found in the NHTSA data base and transform it into aggregated, usable insights.
So, what did we do?
Our team ran a big part of the publicly available NHTSA safety data through our patented NLP pipeline and has extracted aggregated customer concerns as well as vehicle/component failures across different makes and models.
We were able to discover a total of 9,656 (customer-reported) symptoms and 11,299 (technician-reported) failures across 268 makes¹ and 6,138 models in the data base.
The Predii algorithm discovers components and symptoms in natural human language. Part of the work our team did was to adapt and tweak this interpretation of words (“taxonomy”) to the very specific language used in the NHTSA data base in order to boost coverage.
Previously unstructured data is now available and actionable in a Predii Insights Dashboard. The Beat version of Predii Safety Analytics enables viewers to look at:
Top Driver Reported Concerns by frequency,
Top Technician Reported Failures by frequency
Top contributing Make, Model, Vehicle Years to reported Concerns and Failures
Vehicle Year Distribution for reported Concerns and Failures
The Predii Insights Dashboard shows aggregated Top Driver Reported Concerns, Top Technician Reported Failures, Top contributing Makes, Models, Vehicle Years, and Mileage Distribution.
What are the most frequent Customer concerns reported to NHTSA for a specific Year, Vehicle Make, and Model? Top Customer-reported symptoms include failed brakes, faulty airbags, discharged batteries, and more.
The Top Technician-reported failures additionally include e.g. burning smells from vehicle, specific noises or transmission failures.
Which Makes and Models are most frequently connected to which Customer Concerns and Failures? The Predii Insights Dashboard allows drill-down into specific complaints and shows related Makes and Models.
Roadmap beyond Exploration
The idea to process and analyze NHTSA data has originated from our team’s natural curiosity to work with data, find intelligence hidden in it, and constantly push the boundaries of what can be done.
Enterprise AI, however, works a little different. Solutions need to be scalable, preferably based on large or multiple data sources, and answer OEM specific questions. As we get ready to launch a Predii Safety Analytics feature as part of our commercial product offering, these are the next steps on our roadmap to a commercialized solution:
Mileage Distribution for reported Concerns and Failures: predictive and prescriptive analytics that can e.g. predict part and component failure are largely based on vehicle mileage information. The NHTSA data we have used for this POC provides mileage information for a subset of complaints, making it the next feature to be integrated in our dashboard.
Refinement of taxonomy to increase coverage on specific, contextual safety problems: A quick look at the aggregated customer concerns shows us the innate challenges in the unstructured data that Predii has the capacity to solve. Concerns like 'air bag failure' or 'door failure' are extremely dependent on the context that they are written in. A 'failure' in this case could mean 'the airbag deployed when it was not supposed to' or it could even mean that 'the airbag didn't deploy when it was supposed to deploy'. Another example of a highly contextual concern is 'vehicle accelerating suddenly' or 'vehicle hesitating to accelerate'. Both these examples need the establishment of a context that can be picked up by the algorithm automatically, and without human intervention.
Flagging safety concerns by severity: True value lies in not only knowing which failures happen most frequently, but understanding how impactful and severe those failures might be. Concerns like 'engine exploded' or 'engine caught fire' are seemingly low in frequency but high in severity and therefore most likely to cause recalls. Flagging those concerns and symptoms that have led to safety recalls in the past, or that indicate costly repairs, is a crucial step in making the best use of this data.
Predii Safety Analytics supports public education for vehicle safety and can answer one of the most relevant consumer questions: Is their vehicle is as safe and reliable as advertised?
From an OEM perspective, Predii Safety Analytics can support Product and Quality Managers in monitoring patterns, investigating indications of safety concerns, and avoid costly Safety Recalls.
Ultimately, Quality management comes down to answering these questions:
Which parts and components will fail? When (at what mileage) and where will they fail? What impact will that failure have? And: what needs to be done to avoid it?
Predii Safety Analytics can answer these questions. The results described in this article only refer to one piece of the puzzle: publicly available records.
Imagine adding you own data into the mix: warranty claims, repair orders, customer service tickets, telematics data. Powered by our ability to decode human language, and an exceptional understanding of the automotive aftermarket, Predii can provide OEMs with that full visibility of their aftermarket space. For quality and safety issues, and a lot more.
Curious to see what our AI engine can do? Contact us to schedule a demo!
¹total of extracted makes: 1056 incl. trucks, trailers, RV. For quality of the analysis we considered all makes with records >10