Analysing comments (NLP) for Malware Analysis

Title Analysing comments (NLP) for Malware Analysis
Summary Analysing the comments of users in the WebStore to look for malware patterns
Keywords NLP, Malware, Web Security
Prerequisites NLP
Supervisor Pablo Picazo-Sanchez
Level Master
Status Open

Browser extensions are popular small web applications that users install in modern browsers to enrich the user experience on the web. Google’s official extension repository, Chrome Web Store, currently has more than 180,000 extensions between browser extensions, apps, and themes, with many extensions having millions of users. Driven by the popularity of Chrome extensions, browser extension ecosystems have been adopted not only by Chromium-based browsers like Opera, Brave, and Microsoft Edge but also by browsers like Firefox and Safari. The latter browsers draw on the same architecture, allowing developers to export their Chrome extensions easily. When an extension is installed, the browser typically sends a message showing the permissions this new extension requests. The extension is installed and integrated within the browser upon user approval.

The benefits of using browser extensions come at the high price of granting access to a vast amount of sensitive information.

Extensions are usually stored in private repositories managed by vendors, where extensions developers upload them to be freely distributed afterward. The most popular browser extensions repository is the Web Store governed by Google, which banned the possibility of manually installing browser extensions from other sites different than the Web Store years ago.

The Web Store implements a Collaborative Filtering Recommendation System (CFRS) in such a way that extensions are ranked or featured to make it easier for users to find high-quality content. This ranking is performed by a heuristic that considers user ratings, raters, comments and usage statistics, such as the number of downloads and uninstalls over time.

The goal of this project is to analyze the comments users write in the WebStore so we can spot malicious extensions. To do so, we will apply NLP techniques like sentiment analysis or different POS methods to analyze the comments not only sintactically but also morphologically looking for suspicious patterns.