Learning to see the 3D world

Jia-Bin Huang
Talk Series: 
10.07.2022 11:00 to 12:00

Cameras allow us to effortlessly capture the visual world around us and share memorable moments of our lives. While current computer vision systems work remarkably well on recognizing 2D patterns in images, they often have difficulty recovering the complete 3D geometry of dynamic scenes. On the other hand, humans can perceive complex dynamic scenes in terms of physical surfaces, objects, and scenes in 3D and imagine plausible scene appearance from novel viewpoints. In this talk, I will present my research on reconstructing and rendering our 3D world. Specifically, I will present our work on creating compelling 3D photography from a single image, estimating dense, geometrically consistent depth from casually captured cellphone videos, and learning neural implicit representations for free-viewpoint videos. The core idea is to integrate constraints from the physical model behind the visual observations into learning-based algorithms. I will highlight future challenges and plans for building intelligent machines that learn to see, recreate, and interact with the 3D world.