Light Mode

Ego-VCP:

Ego-Vision World Model for Humanoid Contact Planning

Hang Liu2, Yuman Gao1, Sangli Teng1, Yufeng Chi1, Yakun Sophia Shao1, Zhongyu Li3, Maani Ghaffari2 and Koushil Sreenath1
UC Berkeley, UM Ann Arbor, CUHK

arXiv Page

Video Link

Code Repo

1 University of California, Berkeley

2 University of Michigan, Ann Arbor

3 Chinese University of Hong Kong

Video

🔊 Sound on (recommended)

Abstract

Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images.

Highlights

Our world model and sampling-based MPC enables real-time visual contact planning for diverse object interactions in real-world scenarios, with only ego-centric depth camera and proprioception.

Methods

Multi-Task

Multi-task performance and latent space visualization. (a) A joint model matches single-task performance. (b-c) t-SNE shows clear task separation: latent h_t captures evolving dynamics, while latent z_t encodes compact observations.

Planning Visualization

We take use of Blender to visualize the planning process.

Task 1

Task 2

Task 3

Acknowledgments

We would like to thank Jiaze Cai and Yen-Jen Wang for their help in experiments. We are also grateful to Bike Zhang, Fangchen Liu, Chaoyi Pan, Junfeng Long, and Yiyang Shao for their valuable discussions.

This project website is built with Next.js, adapted from the AIRIO website, and incorporates trajectory visualization methods inspired by DIAL-MPC.