You do not have permission to edit this page, for the following reason:
The action you have requested is limited to users in the group: Users.
Project description (free text)
Give a concise project description. Include:
In our current robot setup, there are two cameras: One is an RGB-D camera fixed on the wall, providing the third person view on the scene, and the other one is attached to the robot wrist providing RGB images (without any depth cues) from robot's point of view (i.e. first person view). Assume that there will be a bunch of objects on the table in front of the robot. The robot will first get the 3D point cloud from the RGB-D camera. This point cloud will indeed have some occluded objects since the scene is cluttered. Then the robot arm should then do some reasoning and decide where to approach around the table to increase the information gain about the scene by using the RGB camera on the wrist. Therefore, the first task would be to convert the RGB hand camera images into RGB-D format by using state-of-the-art depth estimation networks. Once this is done, there will be a fusion step where the robot merges both point clouds: one is coming from the wrist and the other one is from the fixed camera. This way, the robot should autonomously decide how many new images (e.g., two more new images) he needs from his wrist camera to detect more objects in the scene. After each fusion step, the robot should estimate the 6D pose of each detected objects in the scene. This topic is more about computer vision and AI.
Summary:
This is a minor edit Watch this page
Cancel
Home
Research
Education
Partners
People
Contact