Privacy-Preserved Generator for Generating Synthetic EHR data

From ISLAB/CAISR
Title Privacy-Preserved Generator for Generating Synthetic EHR data
Summary Time-series GAN and generation of synthetic electrical health records
Keywords GAN, Synthetic data, Time-series
TimeFrame
References Papers: https://ieeexplore.ieee.org/abstract/document/8975823 https://proceedings.neurips.cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf https://arxiv.org/pdf/2009.09283.pdf https://dl.acm.org/doi/abs/10.1145/2810103.2813687

Data: https://lcp.mit.edu/mimic

Prerequisites
Author
Supervisor Atiye Sadat Hashemi, Jens Lundström, Farzaneh Etminani
Level Master
Status Open


Access to high-quality big datasets for improving deep learning (DL)-based models is a big challenge, more specifically in overly sensitive applications such as healthcare systems where maintaining data privacy is a necessity. Synthetic data generation is a principal tool for various users from researchers who leverage data for models’ training, to educators who aim to teach statistical approaches. The aim of using synthetic data can be categorized into several essential groups such as protecting privacy. However, for generating synthetic data using DL models (like generative adversarial networks (GANs)) we need the training data, and it has been proved that the gradient parameters of these models can remember the training data. For focusing on the privacy-preserving issue of training data in synthetic health data generation, we are going to modify the idea of privacy-preserved GANs [1] to suitable GANs for time series data [2]. Time-series GANs are applicable for generating synthetic electrical health records (EHRs) [3]. In this master thesis, we aim to study different differential privacy-preserving methods to add well-designed noise to the gradients during the training phase of time-series GANs.

Conclusion: In this master thesis, we aim to study privacy-preserving approaches in deep learning and develop a model that preserves the privacy of training data in the processes of generating synthetic data. The title of this thesis in detail is ‘the privacy-preserving aspect of generating synthetic electrical health records (Synthetic-EHRs) using time-series generative adversarial networks (Time-GANs).