Deep Active Learning for LiDAR Point Cloud Segmentation

From ISLAB/CAISR
Title Deep Active Learning for LiDAR Point Cloud Segmentation
Summary Active Learning to improve data efficiency for LiDAR point Cloud Segmentation
Keywords Deep Learning, Active Learning, LiDAR data, Segmentation, Data Efficiency, Dataset Compression, Uncertainty Estimation
TimeFrame
References
Prerequisites A solid understanding of Deep Learning and a willingness to learn active learning
Author
Supervisor Abu Mohammed Raisuddin, Eren Erdal Aksoy
Level Master
Status Open


Details: LiDAR perception is vital for autonomous driving. One way to understand the scene during driving for decision making is to semantically segment the LiDAR data. In numerous studies, deep learning has been used for LiDAR point cloud segmentation. And as we know, Deep Learning is data hungry e.g. the more data we feed to the model, the more efficient the model becomes. But annotating point cloud data is cumbersome and requires expert knowledge. Therefore, point cloud annotation is expensive. In machine learning / deep learning domain, active learning is used for annotation cost reduction. Compared to the number of scientific articles published in the whole Deep Learning domain, active learning for LiDAR point cloud segmentation is under-investigated. In this thesis, we will investigate Active Learning for Lidar Point Cloud Segmentation.

Problem Definition: Let’s assume D = (X, Y) is our dataset for investigation where X (= {x1, x2… xn})is the point cloud and Y (y1, y2, … yn) is its label. We are given a labeling budget e.g (T data samples can be labeled where T << n). The goal of this study is to iteratively select T data points from n data points such that when the model is trained with T samples, it will yield as good performance as being trained with n data points. In this regard, we will start with few data points, e.g. 10 point clouds/ or parts of point clouds, and iteratively add new point clouds or part of point clouds that would maximize the data efficiency.

Dataset: Public datasets will be used for this study e.g. Semantic KITTI or WADS or nuScenes

What the student will have to do: 1. Literature review 2. Implement active learning pipeline in PyTorch, the programming task will be separated into many small chunks 3. Run experiments on at least two public datasets 4. Write reports and thesis manuscript