Leveraging LLMs for Clinical Note Annotation and Uncertainty Estimation

From ISLAB/CAISR
Title Leveraging LLMs for Clinical Note Annotation and Uncertainty Estimation
Summary The student will investigate the potential of LLMs to simplify clinical note annotation along with uncertainty estimation, contributing to improved healthcare data management.
Keywords Machine Learning, Large Language Models, Uncertainty Estimation, Electronic Health Records
TimeFrame 2023-2024
References Yang, Zhichao, et al. "Multi-label few-shot ICD coding as autoregressive generation with prompt." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 4. 2023.

Liu, Leibo, et al. "Automated icd coding using extreme multi-label long text transformer-based models." Artificial Intelligence in Medicine (2023): 102662.

Hu, Edward J., et al. "Lora: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021).

Sensoy, Murat, Lance Kaplan, and Melih Kandemir. "Evidential deep learning to quantify classification uncertainty." Advances in neural information processing systems 31 (2018).

Prerequisites Statistics; Neural Networks; Programming (Python)
Author
Supervisor Awais Ashfaq, Prayag Tiwari
Level Master
Status Open


This Master's thesis project aims to harness Large Language Models (LLMs) for automating clinical note annotation, with a specific focus on generating validated diagnostic and procedure codes (ICD and KVÅ) that hold clinical significance. Beginning with the MIMIC-III dataset and extending to real Swedish clinical data, the project will explore the following technical and scientific directions:

1. Model Training: Investigate cutting-edge techniques for training LLMs, including fine-tuning strategies, domain adaptation, and transfer learning, to optimize their performance for clinical note annotation.

2. Uncertainty Estimation Methods: Develop and implement uncertainty estimation methods such as evidential deep learning to provide confidence scores for the model's annotations.

3. Real-World Clinical Utility: Evaluate the clinical utility of the generated diagnostic and procedure codes by collaborating with healthcare professionals and analyzing the impact of these codes on patient care, data management, and reimbursement processes.

4. Multi-Language Adaptation: Explore methods for adapting the LLM models to the Swedish language, ensuring their effectiveness in a non-English clinical setting.

5. Ethical Considerations: Address ethical and privacy concerns related to patient data, ensuring compliance with healthcare regulations and data protection laws.


The core research question, "How can LLMs be effectively trained and deployed to produce clinically validated codes?" will guide these technical and scientific directions. Additionally, the student is encouraged to propose and explore their own research questions.

Contact: Awais Ashfaq (awais.ashfaq@hh.se)