Human-in-the-loop Discovery of Interpretable Concepts in Deep Learning Models

From ISLAB/CAISR
Title Human-in-the-loop Discovery of Interpretable Concepts in Deep Learning Models
Summary Interactive discovery of disentangled and interpretable concepts in Deep Learning Models
Keywords disentangled learning, representation learning, human-in-the-loop, explainable AI
TimeFrame Fall 2022
References
Prerequisites
Author
Supervisor Ece Calikus
Level Master
Status Open


Learning interpretable representations of data that expose semantic meaning has numerous benefits for artificial intelligence, including de-noising, imputing missing values, reducing bias, and interpretable latent spaces for better insight into the application domain. However, most deep learning methods cannot guarantee that lower dimensional latent representations are semantically meaningful to humans as a concept. Disentangled representation learning is an unsupervised learning technique that breaks down, or disentangles, each feature into narrowly defined variables and encodes them as separate dimensions. The goal is to mimic the quick intuition process of a human, using both “high” and “low” dimension reasoning. A representation is considered a disentangled representation if a change in one dimension corresponds to a change in one factor of variation while being relatively invariant to changes in other factors. For example, a disentangled representation would represent gender, hair color, age, and similar features of a face image as separate dimensions of the latent embedding. However, not every disentangled feature is useful for the certain downstream task. Features such as the border shape, the color, or the size of a skin lesion are useful to detect skin cancer, while the same features are not equally useful for other tasks.

In this project, we propose an interactive framework to integrate human knowledge in the visual concept extraction process and use the identified concepts to improve the prediction performance of the downstream task. We will also investigate whether some features are meaningful and transferable across different tasks and domains.