VBC: Visual Whole-Body Control
for Legged Loco-Manipulation

We propose a sim-to-real framework that enables the robot to grasp different objects in varying surroundings.


We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduct the whole-body control autonomously with visual observations. Our approach, namely Visual Whole-Body Control (VBC), is composed of a low-level policy using all degrees of freedom to track the end-effector manipulator position and a high-level policy proposing the end-effector position based on visual inputs. We train both levels of policies in simulation and perform Sim2Real transfer for real robot deployment. We perform extensive experiments and show significant improvements over baselines in picking up diverse objects in different configurations (heights, locations, orientations) and environments.

Multi-Object Grasping Pipeline


Visual Input Visualization


More Examples


Low-Level and High-Level Policy Training in Simulation

All policies are trained parallelly in simulation. The low-level policy is trained to track the given end-effector pose and body velocities, while the high-level policy is trained to propose the end-effector position based on visual inputs. The low-level policy may fail to track some goals due to the IK fail, but the high-level policy will learn to avoid those pose.

Tracking SAM Annotation Explain

We only assign object masks to be grasped by this tool, after annotation all pickup behaviors are automatically performed.

Representative Failure Cases

The robot may fail due to several reasons, like the failure of the depth perception, the lost of the tracking mask, the inproper design of the Z1 gripper, and also generalization error.


Funny Cases



title={Visual Whole-Body Control for Legged Loco-Manipulation},
author={Liu, Minghuan and Chen, Zixuan and Cheng, Xuxin and Ji, Yandong and Yang, Ruihan and Wang, Xiaolong},
journal={arXiv preprint arXiv:2403.16967},