Analysis of Multi-Lingual Vehicle Service Histories

From ISLAB/CAISR
Title Analysis of Multi-Lingual Vehicle Service Histories
Summary Automatic translation and similarity evaluation of multi-lingual natural text descriptions of vehicle repair and maintenance operations
Keywords
TimeFrame Spring 2017
References
Prerequisites Artificial Intelligence, Learning Systems, Data Mining
Author Iyanuoluwa Akanbi
Supervisor Sepideh Pashami, Sławomir Nowaczyk
Level Master
Status Finished


All the maintenance and repair operations conducted at a Volvo authorised workshops are logged in the Volvo Service Records (VSR) Database. The VSR database contains information such as part number and operation code, which is used to keep track of the repairs performed as well as for billing purposes. In most cases, this structured information is accompanied by a text note, written by the workshop technician, which often includes valuable comments regarding the root cause of the repair, the diagnostics process and results, the initial symptoms observed, etc.

The text is manually entered by workshop personnel into the VSR database and thus is full of typos and ungrammatical phrases. In addition, it can be done in almost any language. Nevertheless, this information is very useful, or even necessary, for understanding the actual problems the vehicle is having.

This project is about exploring natural language processing tools that can make sense of those notes, based on the fact that the domain is very constrained. The goal is to find similarities between various VSR entries, across different languages. The first step, therefore, is to perform automatic translation into English.

Preliminary steps for this project are as follows:

  1. preprocessing of the repair comments (for example, removing stop words) and detecting the language used
  2. automatic translation of the comments into English
  3. utilising the structured part of the VSR entry to improve the translation
  4. measuring the similarity of VSR entries and finding groups of related records (e.g. differentiating between maintenance and repairs)