Automatic Idea Detection from social media for Controlling and Preventing Healthcare-Associated Infections (with funding opportunity)

Title Automatic Idea Detection from social media for Controlling and Preventing Healthcare-Associated Infections (with funding opportunity)
Summary This project aims to use advanced NLP tools to automatically detect interesting ideas by processing text available in the medical forums to address the Healthcare-associated infections problem in the hospitals
Keywords Natural Language Processing (NLP), Machine Learning, Automatic Idea Detection (AID), Healthcare-associated infections (HAI)
References Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

Nguyen, Dat Quoc, Thanh Vu, and Anh Tuan Nguyen. "BERTweet: A pre-trained language model for English Tweets." arXiv preprint arXiv:2005.10200 (2020).

Gould, Dinah, et al. "Electronic hand hygiene monitoring: accuracy, impact on the Hawthorne effect and efficiency." Journal of Infection Prevention 21.4 (2020): 136-143.

Christensen, Kasper, et al. "How good are ideas identified by an automatic idea detection system?." Creativity and Innovation Management 27.1 (2018): 23-31.

Prerequisites Machine Learning
Supervisor Fabio Gama, Mahmoud Rahat, Peyman Mashhadi
Level Master
Status Open

      • The project offers funds to devoted students. You also get the chance to collaborate with a globally leading Hygiene and Health Company ***

We as a human have always been eager to automate our tasks. This quest has been fruitfully answered in many domains, such as autonomous driving or robotics, but is less explored in a cognitive context. The question that we are interested in exploring is how we can automate (or at least facilitate) the ideation and innovation process by analyzing user-generated content on social media or the web. This can help companies better understand the market and develop targeted services centered around the customers’ requirements. This is a continuation of an exciting project that started in 2021. The goal is to use advanced natural language processing techniques to analyze social media text and find relevant information for controlling and preventing Healthcare-Associated Infections [1]. Since 2021, the team has been able to crawl approximately 4.5 million tweets and manually label 3600. The tweets use English as the primary language. This unique dataset was then used to train a supervised neural network model that incorporates BERTweet (A pre-trained language model for English Tweets) to generate a conceptual representation of the text. The trained model performs well in estimating the informativeness of the tweets. The first phase of the work gave rise to many exciting research questions. The team is now interested in exploring the following directions: • Using unsupervised ML for clustering the extracted informative tweets from the first model • Using explainable AI to analyze the attention of the model in the text • Investigating the effect of incorporating user-generated metadata (number of likes, replies, etc.) in the performance of the model • Explore the impact of Covid 19 (the dataset covers tweets from both before and after the Covid pandemic) [1] Healthcare-associated infections (HAI) are among the major causes of death of hospitalized patients. Controlling and preventing HAI is difficult because of the complexity of implementing sustainable practices in hospitals, the lack of ways to observe healthcare professional behavior, and companies’ inability to identify HAI prevention practices by themselves. A proposed way for addressing this is through the use of Automatic Idea Detection (AID) systems. AID system is a classification algorithm based on text processing tools that can screen large amounts of information and identify those likely to contain valuable or novel ideas/treatments.

Please contact all three supervisors in an email to express your interest (it is recommended to include a resume as well).