Edit StudentProjectForm: Multi-Sensor Fusion for Semantic Scene Understanding

You do not have permission to edit this page, for the following reason:

The action you have requested is limited to users in the group: Users.

Summary:	MANDATORY: (provide a one-line* summary of the project)*
Programme:	OPTIONAL: (Name and the number of credits)
Keywords:	OPTIONAL: (Give 4-5 keywords)
TimeFrame:	MANDATORY: (Indicate time for the project, for example, Fall 2024)
References:	OPTIONAL: (Give 2-3 references for students to start with)
Prerequisites:	OPTIONAL: (Courses, any other important information)
Supervisor(s):	MANDATORY: (can be multiple names, coma-separated)
Examiner:	OPTIONAL: (Name of project Examiner)
Author(s):	OPTIONAL: (can be multiple names, coma-separated)
Level:	MANDATORY: choose one, preferably <Master>
Status:	MANDATORY: choose <Open> when the topic is ready for students to select

Project description (free text)

Give a concise project description. Include:

Research question
Whether it’s more software or hardware related
A very brief description of 3-4 work packages
Deliverables/outcomes/results

In our current robot setup, there are two cameras: One is an RGB-D camera fixed on the wall, providing the third person view on the scene, and the other one is attached to the robot wrist providing RGB images (without any depth cues) from robot's point of view (i.e. first person view). Assume that there will be a bunch of objects on the table in front of the robot. The robot will first get the 3D point cloud from the RGB-D camera. This point cloud will indeed have some occluded objects since the scene is cluttered. Then the robot arm should then do some reasoning and decide where to approach around the table to increase the information gain about the scene by using the RGB camera on the wrist. Therefore, the first task would be to convert the RGB hand camera images into RGB-D format by using state-of-the-art depth estimation networks. Once this is done, there will be a fusion step where the robot merges both point clouds: one is coming from the wrist and the other one is from the fixed camera. This way, the robot should autonomously decide how many new images (e.g., two more new images) he needs from his wrist camera to detect more objects in the scene. After each fusion step, the robot should estimate the 6D pose of each detected objects in the scene. This topic is more about computer vision and AI.

Summary:

This is a minor edit Watch this page

Cancel

Edit StudentProjectForm: Multi-Sensor Fusion for Semantic Scene Understanding

Navigation menu

Views

Personal tools

Search

Tools