Intelligible patient representation for outcome prediction of congestive heart failure patients

Title Intelligible patient representation for outcome prediction of congestive heart failure patients
Summary Generating patient representation using EHR data
Keywords Machine Learning, Feature Engineering, Medical data analysis
TimeFrame Winter 2017 / Spring 2018
References 1. Miotto R, Li L, Kidd BA, Dudley JT. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Scientific Reports. 2016;6:26094. doi:10.1038/srep26094.

2. Choi, Edward, et al. "Multi-layer representation learning for medical concepts." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016

Prerequisites Good knowledge of applied mathematics. An ability to implement state-of-the-art algorithms in a suitable programming environment. An interest in machine learning algorithms and medical data analysis.
Supervisor Awais Ashfaq, Sławomir Nowaczyk
Level Master
Status Open

Overview: Advancement in computing technologies and machine learning algorithms has enabled us to analyze big amounts of data to enhance the efficiency and productivity of businesses in every industry. Healthcare is no different. The recent decade has witnessed huge advances in the amount of medical data generated and stored in almost every domain in the healthcare sector. The primary purpose of Electronic Health Records (EHR) is to facilitate and improve individual patient care. In addition to it, EHRs today, also serve as data center for clinical research to improve healthcare management, patient safety and clinical decision support.

An important step towards analyzing EHR data is to represent or model it into a computer algorithm. The representation is challenging due to several factors: high dimensionality, noise, sparseness, heterogeneity and incompleteness. The success of machine learning algorithms largely depends on data representation and feature selection. Applications include, but not limited to outcome prediction, drug/procedure efficacy and interaction, patient matching etc. In the current context, we are interested in developing a supervised machine learning algorithm for predicting the readmission risk (outcome) of congestive heart failure (CHF) patient admitted to the hospital.

Labelled EHR data will be provided to the student for the project. The EHR data has a unique structure centred on visits that are temporally ordered. A visit then includes patient-reported symptoms, diagnostic codes defining the patient disease, recommended procedures, labs or medications and general information like location, time, care-provider etc. A sequence of such visits corresponds to the patient state. In addition to this, we also have general patient information like age, gender, demographics etc.

Tentative project plan:

- Scan through patient representation techniques in the literature and analyze their pros and cons on the given data. (See references for a start)

- Based on the aforementioned analysis, develop a patient representation that facilitates predictive modelling for CHF patients.

- Modify the representation (if not interpretable) constraining it to be interpretable.

- Analyze the prediction/interpretability trade-off and comment.


  • Slawomir Nowaczyk (
  • Awais Ashfaq (